User-adaptive video telephony

ABSTRACT

A device may control a video communication via transcoding and/or traffic shaping. The device may include a multipoint control unit (MCU) and/or a server. The device may receive one or more video streams from one or more devices. The device may analyze a received video stream to determine a viewing parameter. The viewing parameter may include a user viewing parameter, a device viewing parameter, and/or a content viewing parameter. The device may modify a video stream based on the viewing parameter. Modifying the video stream may include re-encoding the video stream, adjusting an orientation, removing a video detail, and/or adjusting a bit rate. The device may send the modified video stream to another device. The device may determine a bit rate for the video stream based on the viewing parameter. The device may indicate the bit rate by sending a feedback message and/or by signaling a bandwidth limit.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication No. 61/871,836, filed Aug. 29, 2013; and U.S. ProvisionalPatent Application No. 61/944,548, filed Feb. 25, 2014; the contents ofwhich are incorporated by reference herein.

BACKGROUND

Video telephony is a growing segment of the traffic carried overwireless networks. This trend is expected to continue, as evidenced bythe introduction of support for Apple's FaceTime technology deliveryover LTE networks. Video telephony systems may be integrated into webbrowsers without the need for third party plugins. Mobile videotelephony systems may not take visual links into account in the captureand processing of the video data.

Real-time video telephony over wireless networks may be characterized bysignificant bandwidth and latency requirements. Because of the lowtolerance to latency in interactive sessions such as video chat,buffering at the receivers may be quite limited. The video decoder maybe exposed to the dynamics of the channel characteristics. Some videotelephony systems are not robust or reliable in the context of thedynamic wireless channel. Transient congestion and/or temporarily largepacket latencies may contribute to poor reliability. In wirelessnetworks, there is often a tradeoff between latency and bandwidth.

SUMMARY

Systems, methods, and instrumentalities are provided for controlling avideo communication. A device may control a video communication viatranscoding. The device may include a multipoint control unit (MCU). Thedevice may receive a first video stream from a first device and a secondvideo stream from a second device. The device may receive a third videostream from a third device. The device may receive a fourth video streamfrom the second device. The device may analyze the first video stream todetermine a first viewing parameter associated with the first device.The device may analyze the second video stream to determine a secondviewing parameter associated with the second device. The device mayanalyze the third video stream to determine a third viewing parameterassociated with the third device. The viewing parameter may include auser viewing parameter, a device viewing parameter, and/or a contentviewing parameter. The device may modify the second video stream basedon the first viewing parameter and/or the third viewing parameter. Thedevice may modify the first video stream based on the third viewingparameter and/or the second viewing parameter. The device may modify thefourth video stream based on the third viewing parameter. Modifying thevideo stream may include re-encoding the video stream, adjusting anorientation, removing a video detail, and/or adjusting a bit rate. Thedevice may send the modified second video stream to the first deviceand/or the third device. The device may send the modified first videostream to the second device. The device may send the modified fourthvideo stream to the first device and/or the third device. The device maycompare bit rates associated with the first viewing parameter and thethird viewing parameter. When the third viewing parameter is associatedwith a higher bit rate than the first viewing parameter, the device maymodify the fourth video stream based on the third viewing parameter.

A device may control a video communication via traffic shaping. Thedevice may include an MCU. The device may receive a first video streamfrom a first device and a second video stream from a second device. Thedevice may determine a viewing parameter associated with the firstdevice by analyzing the first video stream. The viewing parameter mayinclude a user viewing parameter, a device viewing parameter, and/or acontent viewing parameter. The device may determine, based on theviewing parameter, a video stream bit rate for the second video stream.The device may indicate the video stream bit rate to the second device.The device may indicate the video stream bit rate by removing one ormore packets from the second video stream before sending the secondvideo stream to the first device.

The device may indicate the video stream bit rate by sending a feedbackmessage that indicates an adjusted packet loss rate. The device maymeasure a packet loss rate for the second video stream. The device maydetermine the adjusted packet loss rate for the second video stream. Theadjusted packet loss rate may be associated with the determined videostream bit rate. The adjusted packet loss rate may differ from themeasured packet loss rate. The device may generate a feedback messagethat indicates the adjusted packet loss rate. The device may send thefeedback message to the second device.

The device may indicate the video stream bit rate by signaling abandwidth limit. The device may determine a first viewing parameter forthe first device and a third viewing parameter for a third device. Thefirst viewing parameter may be associated with the first video stream.The third viewing parameter may be associated with a third video streamwhich may be from the third device. The device may determine a firstvideo stream bit rate for the second video stream and/or a second videostream bit rate for the second video stream. The first video stream bitrate may be based on the first viewing parameter. The second videostream bit rate may be based on the third viewing parameter. The devicemay indicate a bandwidth limit to the second device. The bandwidth limitmay be associated with the first video stream bit rate and/or the secondvideo stream bit rate.

A server may control a video communication between two or more devices.The server may receive a sample of a first video stream from a firstdevice. The server may determine a viewing parameter based on thesample. The viewing parameter may be associated with the first device.The server may indicate a modification to a second video stream based onthe viewing parameter. The modification may include adjusting the bitrate, adjusting the resolution, removing detail, adjusting theorientation, and/or filtering. The server may generate a message thatindicates the modification to the second video stream. The server maysend the message to the second device.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a system diagram of an example communications system in whichone or more disclosed embodiments may be implemented.

FIG. 1B is a system diagram of an example wireless transmit/receive unit(WTRU) that may be used within the communications system illustrated inFIG. 1A.

FIG. 1C is a system diagram of an example radio access network and anexample core network that may be used within the communications systemillustrated in FIG. 1A.

FIG. 1D is a system diagram of another example radio access network andanother example core network that may be used within the communicationssystem illustrated in FIG. 1A.

FIG. 1E is a system diagram of another example radio access network andanother example core network that may be used within the communicationssystem illustrated in FIG. 1A.

FIG. 2A is a diagram illustrating an example mobile video telephonysystem.

FIG. 2B is an illustration of example parameters of a viewing setup.

FIG. 2C is an illustration of an example of contrast sensitivityfunction using a Campbell-Robson chart.

FIG. 3 illustrates an example video telephony session between WTRUs ofdiffering orientations.

FIG. 4 illustrates an example video telephony session between WTRUs ofdiffering orientations.

FIG. 5 is a diagram illustrating an example video telephony systemcomprising WTRUs in communication with one another via a network.

FIG. 6 is a diagram illustrating an example video telephony systemcomprising WTRUs in communication with one another via a network.

FIG. 7 is a diagram illustrating an example video telephony systemcomprising WTRUs in communication with one another via a network.

FIG. 8 is a diagram illustrating an example video telephony systemcomprising WTRUs in communication with one another via a network, withvideo capturing of one WTRU based on an orientation of another WTRU.

FIGS. 9A-9D are diagrams illustrating examples of showing video at areceiving WTRU for a given orientation of the display of a WTRU relativeto the observer.

FIGS. 10A-10B are diagrams illustrating an example of sender-sidecropping.

FIGS. 11A-11B are diagrams illustrating an example of sender-sidedownsizing or down sampling.

FIGS. 12A-12B are diagrams illustrating an example of image sensorselection.

FIG. 13 is a diagram illustrating an example of image sensor arrayrotation.

FIG. 14 is a diagram illustrating an example up direction to a width fora video picture.

FIG. 15 is a diagram illustrating an example eye-axis of a user.

FIG. 16 is a diagram illustrating an example projection of an eye-axisonto a display plane.

FIG. 17 is a diagram illustrating an example call flow for capturingvideo locally according to an orientation of a remote device.

FIG. 18 is a diagram illustrating an example User Adaptive Video (UAV)in a Multipoint Control Unit (MCU) setting.

FIG. 19A is a diagram illustrating an example of an MCU implementing UAVwith an encoder per client endpoint for multiple clients.

FIG. 19B is a diagram illustrating another example of an MCUimplementing UAV with an encoder per client endpoint for multipleclients.

FIG. 20 is a diagram illustrating an example of an MCU with video mixingand a shared encoder.

FIG. 21 is a diagram illustrating an example of an MCU traffic shapingtechnique for UAV.

FIG. 22 is an illustration of an example logical connection among one ormore video conferencing participants and an MCU.

FIG. 23 is a diagram illustrating an example architecture of a system inwhich a UAV application operates via the Cloud.

FIG. 24 is an illustration of an example mesh configuration for WebReal-Time Communication (RTC).

DETAILED DESCRIPTION

A detailed description of illustrative embodiments will now be describedwith reference to the various Figures. Although this descriptionprovides a detailed example of possible implementations, it should benoted that the details are intended to be examples and in no way limitthe scope of the application.

FIG. 1A is a diagram of an example communications system 100 in whichone or more disclosed embodiments may be implemented. The communicationssystem 100 may be a multiple access system that provides content, suchas voice, data, video, messaging, broadcast, etc., to multiple wirelessusers. The communications system 100 may enable multiple wireless usersto access such content through the sharing of system resources,including wireless bandwidth. For example, the communications systems100 may employ one or more channel access methods, such as code divisionmultiple access (CDMA), time division multiple access (TDMA), frequencydivision multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrierFDMA (SC-FDMA), and the like.

As shown in FIG. 1A, the communications system 100 may include wirelesstransmit/receive units (WTRUs) 102 a, 102 b, 102 c, and/or 102 d (whichgenerally or collectively may be referred to as WTRU 102), a radioaccess network (RAN) 103/104/105, a core network 106/107/109, a publicswitched telephone network (PSTN) 108, the Internet 110, and othernetworks 112, though it will be appreciated that the disclosedembodiments contemplate any number of WTRUs, base stations, networks,and/or network elements. Each of the WTRUs 102 a, 102 b, 102 c, 102 dmay be any type of device configured to operate and/or communicate in awireless environment. By way of example, the WTRUs 102 a, 102 b, 102 c,102 d may be configured to transmit and/or receive wireless signals andmay include user equipment (UE), a mobile station, a fixed or mobilesubscriber unit, a pager, a cellular telephone, a personal digitalassistant (PDA), a smartphone, a laptop, a netbook, a personal computer,a wireless sensor, consumer electronics, and the like.

The communications systems 100 may also include a base station 114 a anda base station 114 b. Each of the base stations 114 a, 114 b may be anytype of device configured to wirelessly interface with at least one ofthe WTRUs 102 a, 102 b, 102 c, 102 d to facilitate access to one or morecommunication networks, such as the core network 106/107/109, theInternet 110, and/or the networks 112. By way of example, the basestations 114 a, 114 b may be a base transceiver station (BTS), a Node-B,an eNode B, a Home Node B, a Home eNode B, a site controller, an accesspoint (AP), a wireless router, and the like. While the base stations 114a, 114 b are each depicted as a single element, it will be appreciatedthat the base stations 114 a, 114 b may include any number ofinterconnected base stations and/or network elements.

The base station 114 a may be part of the RAN 103/104/105, which mayalso include other base stations and/or network elements (not shown),such as a base station controller (BSC), a radio network controller(RNC), relay nodes, etc. The base station 114 a and/or the base station114 b may be configured to transmit and/or receive wireless signalswithin a particular geographic region, which may be referred to as acell (not shown). The cell may further be divided into cell sectors. Forexample, the cell associated with the base station 114 a may be dividedinto three sectors. Thus, in one embodiment, the base station 114 a mayinclude three transceivers, i.e., one for each sector of the cell. Inanother embodiment, the base station 114 a may employ multiple-inputmultiple output (MIMO) technology and, therefore, may utilize multipletransceivers for each sector of the cell.

The base stations 114 a, 114 b may communicate with one or more of theWTRUs 102 a, 102 b, 102 c, 102 d over an air interface 115/116/117,which may be any suitable wireless communication link (e.g., radiofrequency (RF), microwave, infrared (IR), ultraviolet (UV), visiblelight, etc.). The air interface 115/116/117 may be established using anysuitable radio access technology (RAT).

More specifically, as noted above, the communications system 100 may bea multiple access system and may employ one or more channel accessschemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. Forexample, the base station 114 a in the RAN 103/104/105 and the WTRUs 102a, 102 b, 102 c may implement a radio technology such as UniversalMobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA),which may establish the air interface 115/116/117 using wideband CDMA(WCDMA). WCDMA may include communication protocols such as High-SpeedPacket Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may includeHigh-Speed Downlink Packet Access (HSDPA) and/or High-Speed UplinkPacket Access (HSUPA).

In another embodiment, the base station 114 a and the WTRUs 102 a, 102b, 102 c may implement a radio technology such as Evolved UMTSTerrestrial Radio Access (E-UTRA), which may establish the air interface115/116/117 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).

In other embodiments, the base station 114 a and the WTRUs 102 a, 102 b,102 c may implement radio technologies such as IEEE 802.16 (i.e.,Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000,CDMA2000 1×, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), InterimStandard 95 (IS-95). Interim Standard 856 (IS-856), Global System forMobile communications (GSM), Enhanced Data rates for GSM Evolution(EDGE), GSM EDGE (GERAN), and the like.

The base station 114 b in FIG. 1A may be a wireless router, Home Node B,Home eNode B, or access point, for example, and may utilize any suitableRAT for facilitating wireless connectivity in a localized area, such asa place of business, a home, a vehicle, a campus, and the like. In oneembodiment, the base station 114 b and the WTRUs 102 c, 102 d mayimplement a radio technology such as IEEE 802.11 to establish a wirelesslocal area network (WLAN). In another embodiment, the base station 114 band the WTRUs 102 c, 102 d may implement a radio technology such as IEEE802.15 to establish a wireless personal area network (WPAN). In yetanother embodiment, the base station 114 b and the WTRUs 102 c, 102 dmay utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE,LTE-A, etc.) to establish a picocell or femtocell. As shown in FIG. 1A,the base station 114 b may have a direct connection to the Internet 110.Thus, the base station 114 b may not be required to access the Internet110 via the core network 106/107/109.

The RAN 103/104/105 may be in communication with the core network106/107/109, which may be any type of network configured to providevoice, data, applications, and/or voice over internet protocol (VoIP)services to one or more of the WTRUs 102 a, 102 b, 102 c, 102 d. Forexample, the core network 106/107/109 may provide call control, billingservices, mobile location-based services, pre-paid calling, Internetconnectivity, video distribution, etc., and/or perform high-levelsecurity functions, such as user authentication. Although not shown inFIG. 1A, it will be appreciated that the RAN 103/104/105 and/or the corenetwork 106/107/109 may be in direct or indirect communication withother RANs that employ the same RAT as the RAN 103/104/105 or adifferent RAT. For example, in addition to being connected to the RAN103/104/105, which may be utilizing an E-UTRA radio technology, the corenetwork 106/107/109 may also be in communication with another RAN (notshown) employing a GSM radio technology.

The core network 106/107/109 may also serve as a gateway for the WTRUs102 a, 102 b, 102 c, 102 d to access the PSTN 108, the Internet 110,and/or other networks 112. The PSTN 108 may include circuit-switchedtelephone networks that provide plain old telephone service (POTS). TheInternet 110 may include a global system of interconnected computernetworks and devices that use common communication protocols, such asthe transmission control protocol (TCP), user datagram protocol (UDP)and the internet protocol (IP) in the TCP/IP internet protocol suite.The networks 112 may include wired or wireless communications networksowned and/or operated by other service providers. For example, thenetworks 112 may include another core network connected to one or moreRANs, which may employ the same RAT as the RAN 103/104/105 or adifferent RAT.

Some or all of the WTRUs 102 a, 102 b, 102 c, 102 d in thecommunications system 100 may include multi-mode capabilities, i.e., theWTRUs 102 a. 102 b, 102 c, 102 d may include multiple transceivers forcommunicating with different wireless networks over different wirelesslinks. For example, the WTRU 102 c shown in FIG. 1A may be configured tocommunicate with the base station 114 a, which may employ acellular-based radio technology, and with the base station 114 b, whichmay employ an IEEE 802 radio technology.

FIG. 1B is a system diagram of an example WTRU 102. As shown in FIG. 1B,the WTRU 102 may include a processor 118, a transceiver 120, atransmit/receive element 122, a speaker/microphone 124, a keypad 126, adisplay/touchpad 128, non-removable memory 130, removable memory 132, apower source 134, a global positioning system (GPS) chipset 136, andother peripherals 138. It will be appreciated that the WTRU 102 mayinclude any sub-combination of the foregoing elements while remainingconsistent with an embodiment. Also, embodiments contemplate that thebase stations 114 a and 114 b, and/or the nodes that base stations 114 aand 114 b may represent, such as but not limited to transceiver station(BTS), a Node-B, a site controller, an access point (AP), a home node-B,an evolved home node-B (eNodeB), a home evolved node-B (HeNB orHeNodeB), a home evolved node-B gateway, and proxy nodes, among others,may include some or all of the elements depicted in FIG. 1B anddescribed herein.

The processor 118 may be a general purpose processor, a special purposeprocessor, a conventional processor, a digital signal processor (DSP), aplurality of microprocessors, one or more microprocessors in associationwith a DSP core, a controller, a microcontroller, Application SpecificIntegrated Circuits (ASICs), Field Programmable Gate Array (FPGAs)circuits, any other type of integrated circuit (IC), a state machine,and the like. The processor 118 may perform signal coding, dataprocessing, power control, input/output processing, and/or any otherfunctionality that enables the WTRU 102 to operate in a wirelessenvironment. The processor 118 may be coupled to the transceiver 120,which may be coupled to the transmit/receive element 122. While FIG. 1Bdepicts the processor 118 and the transceiver 120 as separatecomponents, it will be appreciated that the processor 118 and thetransceiver 120 may be integrated together in an electronic package orchip.

The transmit/receive element 122 may be configured to transmit signalsto, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117. For example, in one embodiment,the transmit/receive element 122 may be an antenna configured totransmit and/or receive RF signals. In another embodiment, thetransmit/receive element 122 may be an emitter/detector configured totransmit and/or receive IR, UV, or visible light signals, for example.In yet another embodiment, the transmit/receive element 122 may beconfigured to transmit and receive both RF and light signals. It will beappreciated that the transmit/receive element 122 may be configured totransmit and/or receive any combination of wireless signals.

In addition, although the transmit/receive element 122 is depicted inFIG. 1B as a single element, the WTRU 102 may include any number oftransmit/receive elements 122. More specifically, the WTRU 102 mayemploy MIMO technology. Thus, in one embodiment, the WTRU 102 mayinclude two or more transmit/receive elements 122 (e.g., multipleantennas) for transmitting and receiving wireless signals over the airinterface 115/116/117.

The transceiver 120 may be configured to modulate the signals that areto be transmitted by the transmit/receive element 122 and to demodulatethe signals that are received by the transmit/receive element 122. Asnoted above, the WTRU 102 may have multi-mode capabilities. Thus, thetransceiver 120 may include multiple transceivers for enabling the WTRU102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, forexample.

The processor 118 of the WTRU 102 may be coupled to, and may receiveuser input data from, the speaker/microphone 124, the keypad 126, and/orthe display/touchpad 128 (e.g., a liquid crystal display (LCD) displayunit or organic light-emitting diode (OLED) display unit). The processor118 may also output user data to the speaker/microphone 124, the keypad126, and/or the display/touchpad 128. In addition, the processor 118 mayaccess information from, and store data in, any type of suitable memory,such as the non-removable memory 130 and/or the removable memory 132.The non-removable memory 130 may include random-access memory (RAM),read-only memory (ROM), a hard disk, or any other type of memory storagedevice. The removable memory 132 may include a subscriber identitymodule (SIM) card, a memory stick, a secure digital (SD) memory card,and the like. In other embodiments, the processor 118 may accessinformation from, and store data in, memory that is not physicallylocated on the WTRU 102, such as on a server or a home computer (notshown).

The processor 118 may receive power from the power source 134, and maybe configured to distribute and/or control the power to the othercomponents in the WTRU 102. The power source 134 may be any suitabledevice for powering the WTRU 102. For example, the power source 134 mayinclude one or more dry cell batteries (e.g., nickel-cadmium (NiCd),nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion),etc.), solar cells, fuel cells, and the like.

The processor 118 may also be coupled to the GPS chipset 136, which maybe configured to provide location information (e.g., longitude andlatitude) regarding the current location of the WTRU 102. In additionto, or in lieu of, the information from the GPS chipset 136, the WTRU102 may receive location information over the air interface 115/116/117from a base station (e.g., base stations 114 a, 114 b) and/or determineits location based on the timing of the signals being received from twoor more nearby base stations. It will be appreciated that the WTRU 102may acquire location information by way of any suitablelocation-determination implementation while remaining consistent with anembodiment.

The processor 118 may further be coupled to other peripherals 138, whichmay include one or more software and/or hardware modules that provideadditional features, functionality and/or wired or wirelessconnectivity. For example, the peripherals 138 may include anaccelerometer, an e-compass, a satellite transceiver, a digital camera(for photographs or video), a universal serial bus (USB) port, avibration device, a television transceiver, a hands free headset, aBluetooth® module, a frequency modulated (FM) radio unit, a digitalmusic player, a media player, a video game player module, an Internetbrowser, and the like.

FIG. 1C is a system diagram of the RAN 103 and the core network 106according to an embodiment. As noted above, the RAN 103 may employ aUTRA radio technology to communicate with the WTRUs 102 a, 102 b, 102 cover the air interface 115. The RAN 103 may also be in communicationwith the core network 106. As shown in FIG. 1C, the RAN 103 may includeNode-Bs 140 a, 140 b, 140 c, which may each include one or moretransceivers for communicating with the WTRUs 102 a, 102 b, 102 c overthe air interface 115. The Node-Bs 140 a, 140 b, 140 c may each beassociated with a particular cell (not shown) within the RAN 103. TheRAN 103 may also include RNCs 142 a, 142 b. It will be appreciated thatthe RAN 103 may include any number of Node-Bs and RNCs while remainingconsistent with an embodiment.

As shown in FIG. 1C, the Node-Bs 140 a, 140 b may be in communicationwith the RNC 142 a. Additionally, the Node-B 140 c may be incommunication with the RNC 142 b. The Node-Bs 140 a, 140 b, 140 c maycommunicate with the respective RNCs 142 a, 142 b via an Iub interface.The RNCs 142 a, 142 b may be in communication with one another via anIur interface. Each of the RNCs 142 a, 142 b may be configured tocontrol the respective Node-Bs 140 a, 140 b, 140 c to which it isconnected. In addition, each of the RNCs 142 a, 142 b may be configuredto carry out or support other functionality, such as outer loop powercontrol, load control, admission control, packet scheduling, handovercontrol, macrodiversity, security functions, data encryption, and thelike.

The core network 106 shown in FIG. 1C may include a media gateway (MGW)144, a mobile switching center (MSC) 146, a serving GPRS support node(SGSN) 148, and/or a gateway GPRS support node (GGSN) 150. While each ofthe foregoing elements are depicted as part of the core network 106, itwill be appreciated that any one of these elements may be owned and/oroperated by an entity other than the core network operator.

The RNC 142 a in the RAN 103 may be connected to the MSC 146 in the corenetwork 106 via an IuCS interface. The MSC 146 may be connected to theMGW 144. The MSC 146 and the MGW 144 may provide the WTRUs 102 a, 102 b,102 c with access to circuit-switched networks, such as the PSTN 108, tofacilitate communications between the WTRUs 102 a, 102 b, 102 c andtraditional land-line communications devices.

The RNC 142 a in the RAN 103 may also be connected to the SGSN 148 inthe core network 106 via an IuPS interface. The SGSN 148 may beconnected to the GGSN 150. The SGSN 148 and the GGSN 150 may provide theWTRUs 102 a, 102 b, 102 c with access to packet-switched networks, suchas the Internet 110, to facilitate communications between and the WTRUs102 a, 102 b, 102 c and IP-enabled devices.

As noted above, the core network 106 may also be connected to thenetworks 112, which may include other wired or wireless networks thatare owned and/or operated by other service providers.

FIG. 1D is a system diagram of the RAN 104 and the core network 107according to an embodiment. As noted above, the RAN 104 may employ anE-UTRA radio technology to communicate with the WTRUs 102 a, 102 b, 102c over the air interface 116. The RAN 104 may also be in communicationwith the core network 107.

The RAN 104 may include eNode-Bs 160 a, 160 b. 160 c, though it will beappreciated that the RAN 104 may include any number of eNode-Bs whileremaining consistent with an embodiment. The eNode-Bs 160 a, 160 b, 160c may each include one or more transceivers for communicating with theWTRUs 102 a, 102 b, 102 c over the air interface 116. In one embodiment,the eNode-Bs 160 a. 160 b, 160 c may implement MIMO technology. Thus,the eNode-B 160 a, for example, may use multiple antennas to transmitwireless signals to, and receive wireless signals from, the WTRU 102 a.

Each of the eNode-Bs 160 a, 160 b, 160 c may be associated with aparticular cell (not shown) and may be configured to handle radioresource management decisions, handover decisions, scheduling of usersin the uplink and/or downlink, and the like. As shown in FIG. 1D, theeNode-Bs 160 a, 160 b, 160 c may communicate with one another over an X2interface.

The core network 107 shown in FIG. 1D may include a mobility managementgateway (MME) 162, a serving gateway 164, and a packet data network(PDN) gateway 166. While each of the foregoing elements are depicted aspart of the core network 107, it will be appreciated that any one ofthese elements may be owned and/or operated by an entity other than thecore network operator.

The MME 162 may be connected to each of the eNode-Bs 160 a, 160 b, 160 cin the RAN 104 via an S1 interface and may serve as a control node. Forexample, the MME 162 may be responsible for authenticating users of theWTRUs 102 a, 102 b, 102 c, bearer activation/deactivation, selecting aparticular serving gateway during an initial attach of the WTRUs 102 a,102 b, 102 c, and the like. The MME 162 may also provide a control planefunction for switching between the RAN 104 and other RANs (not shown)that employ other radio technologies, such as GSM or WCDMA.

The serving gateway 164 may be connected to each of the eNode-Bs 160 a,160 b, 160 c in the RAN 104 via the S1 interface. The serving gateway164 may generally route and forward user data packets to/from the WTRUs102 a, 102 b, 102 c. The serving gateway 164 may also perform otherfunctions, such as anchoring user planes during inter-eNode B handovers,triggering paging when downlink data is available for the WTRUs 102 a,102 b, 102 c, managing and storing contexts of the WTRUs 102 a, 102 b,102 c, and the like.

The serving gateway 164 may also be connected to the PDN gateway 166,which may provide the WTRUs 102 a, 102 b, 102 c with access topacket-switched networks, such as the Internet 110, to facilitatecommunications between the WTRUs 102 a, 102 b, 102 c and IP-enableddevices.

The core network 107 may facilitate communications with other networks.For example, the core network 107 may provide the WTRUs 102 a, 102 b,102 c with access to circuit-switched networks, such as the PSTN 108, tofacilitate communications between the WTRUs 102 a, 102 b, 102 c andtraditional land-line communications devices. For example, the corenetwork 107 may include, or may communicate with, an IP gateway (e.g.,an IP multimedia subsystem (IMS) server) that serves as an interfacebetween the core network 107 and the PSTN 108. In addition, the corenetwork 107 may provide the WTRUs 102 a, 102 b, 102 c with access to thenetworks 112, which may include other wired or wireless networks thatare owned and/or operated by other service providers.

FIG. 1E is a system diagram of the RAN 105 and the core network 109according to an embodiment. The RAN 105 may be an access service network(ASN) that employs IEEE 802.16 radio technology to communicate with theWTRUs 102 a, 102 b, 102 c over the air interface 117. As will be furtherdiscussed below, the communication links between the differentfunctional entities of the WTRUs 102 a, 102 b, 102 c, the RAN 105, andthe core network 109 may be defined as reference points.

As shown in FIG. 1E, the RAN 105 may include base stations 180 a, 180 b,180 c, and an ASN gateway 182, though it will be appreciated that theRAN 105 may include any number of base stations and ASN gateways whileremaining consistent with an embodiment. The base stations 180 a, 180 b,180 c may each be associated with a particular cell (not shown) in theRAN 105 and may each include one or more transceivers for communicatingwith the WTRUs 102 a, 102 b, 102 c over the air interface 117. In oneembodiment, the base stations 180 a, 180 b. 180 c may implement MIMOtechnology. Thus, the base station 180 a, for example, may use multipleantennas to transmit wireless signals to, and receive wireless signalsfrom, the WTRU 102 a. The base stations 180 a, 180 b, 180 c may alsoprovide mobility management functions, such as handoff triggering,tunnel establishment, radio resource management, traffic classification,quality of service (QoS) policy enforcement, and the like. The ASNgateway 182 may serve as a traffic aggregation point and may beresponsible for paging, caching of subscriber profiles, routing to thecore network 109, and the like.

The air interface 117 between the WTRUs 102 a, 102 b, 102 c and the RAN105 may be defined as an R1 reference point that implements the IEEE802.16 specification. In addition, each of the WTRUs 102 a, 102 b, 102 cmay establish a logical interface (not shown) with the core network 109.The logical interface between the WTRUs 102 a, 102 b, 102 c and the corenetwork 109 may be defined as an R2 reference point, which may be usedfor authentication, authorization, IP host configuration management,and/or mobility management.

The communication link between each of the base stations 180 a, 180 b,180 c may be defined as an R8 reference point that includes protocolsfor facilitating WTRU handovers and the transfer of data between basestations. The communication link between the base stations 180 a, 180 b,180 c and the ASN gateway 182 may be defined as an R6 reference point.The R6 reference point may include protocols for facilitating mobilitymanagement based on mobility events associated with each of the WTRUs102 a, 102 b, 102 c.

As shown in FIG. 1E, the RAN 105 may be connected to the core network109. The communication link between the RAN 105 and the core network 109may defined as an R3 reference point that includes protocols forfacilitating data transfer and mobility management capabilities, forexample. The core network 109 may include a mobile IP home agent(MIP-HA) 184, an authentication, authorization, accounting (AAA) server186, and a gateway 188. While each of the foregoing elements aredepicted as part of the core network 109, it will be appreciated thatany one of these elements may be owned and/or operated by an entityother than the core network operator.

The MIP-HA may be responsible for IP address management, and may enablethe WTRUs 102 a, 102 b, 102 c to roam between different ASNs and/ordifferent core networks. The MIP-HA 184 may provide the WTRUs 102 a, 102b, 102 c with access to packet-switched networks, such as the Internet110, to facilitate communications between the WTRUs 102 a, 102 b, 102 cand IP-enabled devices. The AAA server 186 may be responsible for userauthentication and for supporting user services. The gateway 188 mayfacilitate interworking with other networks. For example, the gateway188 may provide the WTRUs 102 a, 102 b, 102 c with access tocircuit-switched networks, such as the PSTN 108, to facilitatecommunications between the WTRUs 102 a, 102 b, 102 c and traditionalland-line communications devices. In addition, the gateway 188 mayprovide the WTRUs 102 a, 102 b, 102 c with access to the networks 112,which may include other wired or wireless networks that are owned and/oroperated by other service providers.

Although not shown in FIG. 1E, it will be appreciated that the RAN 105may be connected to other ASNs and the core network 109 may be connectedto other core networks. The communication link between the RAN 105 theother ASNs may be defined as an R4 reference point, which may includeprotocols for coordinating the mobility of the WTRUs 102 a, 102 b, 102 cbetween the RAN 105 and the other ASNs. The communication link betweenthe core network 109 and the other core networks may be defined as an R5reference, which may include protocols for facilitating interworkingbetween home core networks and visited core networks.

FIG. 2A illustrates an example mobile video telephony system 200. Afirst user 202 may have a visual link 204 with a first wirelesstransmit/receive unit (WTRU) 206. The first WTRU 206 may include asmartphone or tablet. The first WTRU 206 may communicate with a firsteNB 208 via a wireless link 210. The first eNB 208 may communicate witha network, such as the Internet 212, via a gateway (GW) 214. A seconduser 216 may have a visual link 218 with a second WTRU 220. The secondWTRU 220 may communicate with a second eNB 222 via a wireless link 224.The second eNB 222 may communicate with the Internet 212 via a GW 226.Embodiments contemplate that other wireless technologies and/orcommunication paths may be possible for a mobile video telephony system.For example, one or more WTRU's may be connected using IEEE 802.11technology. The one or multiple eNB's, such as eNB 208 and eNB 222, maybe replaced with one or multiple Wi-Fi Access Points.

Systems, methods, and instrumentalities are disclosed for communicatingvideo data and reducing the likelihood of congestion while managinglatency and reducing the bandwidth demand while avoiding degradation invideo quality. Glare from relatively narrow beam light sources, such aslight bulbs, the sun, etc. may be reduced. Backlight brightness may beadapted in response to the overall diffuse illuminance of thebackground.

Communication of video data may be adapted to user viewing conditionsfor both streaming and real-time video telephony applications. In thecontext of real-time video telephony, the front-facing camera may beused, e.g., use of the front-facing camera may be assumed. Buffering maybe limited for interactive video telephony sessions. Video telephony mayinvolve the communication of data that is not pre-encoded.

Communication of video data may be adapted to different user/deviceorientations. Mobile devices in a conversation may exchange orientationinformation.

A multipoint control unit (MCU) may be used to bridge videoconferencingconnections. The MCU may be used to allow more than one endpoints and/orgateways to connect in a multipoint conference. An MCU may provide oneor more of the following functions: call setup, admission control, audiomixing, simple voice switching, transcoding between different videoformats, rate adaptation, continuous presence (e.g., video mixing wheremultiple parties may be seen at once), among others, for example. Adevice (e.g., a video telephony client) may analyze a video stream fromanother device to determine how to encode the transmitted video. The MCUmay analyze the video streams from one or more devices (e.g., individualconference participants) and/or modify one or more of the associatedvideo streams. Analyzing the video stream may include analyzing videopacket data, video bit stream data, side information and/or controlsignaling associated with the video stream. Control signaling may beconveyed, for example, using SIP messages, H.245 messages, HTTPrequests/responses, and/or the like. The MCU may signal (e.g., indicate)a video stream modification to one or more of the devices (e.g., theindividual endpoints). One or more of the devices may implement useradaptive video telephony. One or more of the devices may implement useradaptive video telephony based on the signal from the MCU.

A visibility of information may be determined based on one or morefactors associated with information displayed on a display. For example,perceptible detail and/or imperceptible detail may be recognized indisplayed video content. The difference between perceptible detail andimperceptible detail may be determined. A number of factors may be usedto determine the visibility of information displayed on a display. Thesefactors may include a viewing parameter. The viewing parameter mayinclude one or more of: viewing distance (e.g., distance of user toscreen), contrast sensitivity, display size, display pixel density,ambient illumination, motion of the display relative to the user, andother factors, for example. The flexibility in usage of mobile devicesmay contribute to the variability of the viewing parameters. Forexample, a mobile device held at arms-length away from the user maypresent information at a higher spatial density, compared to a deviceheld closer to (e.g., inches from) the user's face. As another example,visibility of information on the mobile device display may be lower whenthe device is viewed under direct sunlight than when the device isviewed in a dark room. As another example, the user may perceive lessdetail if the device is in motion relative to the user's eyes (e.g., theuser is walking down a busy street holding a phone in his hand), thanwhen the device is not in motion (e.g., the user is sitting in a chairholding the phone).

One or more viewing parameters may be used to determine and/or toestimate the amount of detail that a user could perceive (e.g., byutilizing models of human visual perception). The encoding and/orsending of video content may be adapted (e.g., modified) based onviewing parameter(s). The modified encoding and/or sending of videocontent may preserve one or more details that may be perceptible to theuser. The modified encoding and/or sending of video content may preserveperceptible details. For example, a pre-processing filter may removedetails that the viewing user might not perceive (e.g., given thecurrent viewing conditions). Pre-processed video content may be encoded(e.g., re-encoded) using a lower bit rate than the original videocontent. The properties of video capture, re-sampling, and/or encodingmay be modified (e.g., directly adapted). The properties of videocapture, re-sampling, and/or encoding may be modified to capture a levelof detail (e.g., a highest level) that the viewing user can perceive,based on the current viewing conditions and/or viewing parameters. Thevideo content may be encoded using a video resolution (e.g., a lowestlevel) and/or a bit rate (e.g., a lowest bit rate), to preserve theamount of detail which the viewing user may be capable of perceiving.The properties of video capture, re-sampling, and/or encoding (e.g., avideo resolution or a bit rate) may be determined based on human visualperception models. The properties of video capture, re-sampling, and/orencoding may be determined experimentally.

FIG. 2B illustrates some example parameters (e.g., viewing parameters)of video viewing setup. For purposes of illustration, a horizontal sliceis shown, and it is assumed that the visual field is formed by binocularvision (e.g., it is about 120° horizontally). The viewing parameters mayinclude screen size, distance to the screen, screen resolution, screendensity (in pixels per inch), and/or viewing angle. Viewing parametersmay be interrelated. For example, viewing angle may be computed as:

${{viewing}\mspace{14mu} {angle}} = {2\mspace{11mu} {{\arctan \left( \frac{{screen}\mspace{14mu} {width}}{2 \cdot {distance}} \right)}.}}$

Contrast or luminance contrast is a perceptual measure (e.g., viewingparameter) that may define the difference between the perceivedlightness of two colors. The contrast of a periodic pattern such assinusoidal grating may be measured using Michelson's contrast definedas:

${C = \frac{L_{\max} - L_{\min}}{L_{\max} + L_{\min}}},$

where L_(max) and L_(min) are the maximum and minimum luminance values,respectively. The contrast may be defined as:

$C = {\frac{\left( {L_{\max} - L_{\min}} \right)\text{/}2}{\left( {L_{\max} + L_{\min}} \right)\text{/}2} = {\frac{Amplitude}{Average}.}}$

The level of contrast that may be useful to elicit a perceived responseby the human visual system may be the contrast threshold, and theinverse of the contrast threshold may be the contrast sensitivity.Contrast sensitivity may be computed as follows

${{Contrast}\mspace{14mu} {sensitivity}} = {\frac{1}{{Contrast}\mspace{20mu} {threshold}} = \frac{Average}{Amplitude}}$

FIG. 2C is an illustration of an example of a contrast sensitivityfunction using a Campbell-Robson chart. Contrast sensitivity may vary asa function of spatial frequency. In a Campbell-Robson chart, spatialfrequency increases logarithmically from left to right and contrastdecreases logarithmically from bottom to top. The relationship betweencontrast sensitivity and spatial frequency is called the contrastsensitivity function (CSF) and the CSF curve is illustrated in FIG. 2C.

The CSF may have a maximum at 4 cycles per degree (CPD). The CSF maydecrease at both lower and higher frequencies (e.g., thereby yielding aband pass characteristic). The CSF curve may define a threshold ofvisibility. The region above the CSF curve may be invisible to a humanobserver. Several different CSF models may be used, such as the modelsby Movshon and Kiorpes, Barten, and Daly.

FIG. 3 illustrates an example video telephony session between WTRUs 300,302 of differing orientations. In the example shown in FIG. 3, WTRUs300, 302 may conduct a two-way video calling session. The best videoquality/experience may be achieved when the orientation of WTRUs 300,302 match (e.g., when both WTRUs 300, 302 are in a portrait orientationor in a landscape orientation). When the orientations of WTRUs 300, 302are not aligned (e.g., when WTRU 300 is in a portrait orientation andWTRU 302 is in a landscape orientation as depicted in FIG. 3), thereceived image (e.g., video and/or picture) may be reformatted. Thereceived image may be down sampled and displayed in a pillbox format,with black bars 304, 306, 308, 310 placed on the side of the image 312,314. The down sampled image and reformatted image may degrade (e.g.,significantly degrade) the user experience. A portion (e.g., only aportion) of the video/picture, of the original resolution that is codedand transmitted across the communication network, may be displayed.Displaying a portion of the video/picture may waste resources (e.g.,bandwidth) of the communication system.

FIG. 4 illustrates an example video telephony session between WTRUs 400,402 of differing orientations. In the example shown in FIG. 4. WTRUs400, 402 may conduct a two-way video calling session. The video streammay be captured (e.g., at the sender of the video) based on theorientation of the sender WTRU. For example, the sender WTRU may capturevideo in a landscape format when the sender WTRU is oriented in alandscape orientation, such as WTRU 402. The receiver WTRU may crop thereceived video stream to match the orientation of the receiver WTRU. Forexample, the receiver WTRU may crop a received landscape video stream toa portrait format when the receiver WTRU is oriented in a portraitorientation, such as WTRU 400. Cropping the video (e.g.,inappropriately) may lead to loss of important objects in the scene.Sending the entire video across the communication system may beinefficient if part of the video may not be displayed at the receiverend.

Viewing conditions may be estimated. Viewing conditions may include oneor more viewing parameters. Because the video from the camera of a WTRUmay be available on both a local WTRU and a remote WTRU of a videotelephony session, the viewing conditions may be estimated by either thelocal WTRU or the remote WTRU.

A viewing parameter may be signaled (e.g., indicated) to the remote WTRU(e.g., when the local WTRU estimates the viewing parameter). Forexample, the viewing parameter may be signaled using SEI messagesembedded in the video bit stream, extensions of control protocolmessages (e.g., H.245 in H.323 stack), APP messages sent over RTCP,and/or additional protocols (e.g., custom protocols).

Face detection and/or distance estimation logic may be applied to thereceived video stream (e.g., when the remote WTRU estimates the viewingparameter). Remote WTRU estimation (e.g., remote-end estimation) can bedeployed without any changes in standards. Remote WTRU estimation maywork well with high-resolution and/or high quality conferencing. Facerecognition may benefit from high-resolution conferencing. Ambient lightestimation may include auto-exposure logic. Remote WTRU estimation mayinclude knowledge of the pixel density of the display on the other end(e.g., the local WTRU display) of the video telephony session.

The video stream may contain information that may be used to infercertain aspects of the user viewing conditions. Viewing conditioninformation (e.g., one or more viewing parameters) in addition to thevideo stream may be signaled, as shown in FIG. 5. FIG. 5 illustrates anexample video telephony system 500 that includes WTRUs 502, 504 incommunication with one another via a network 506, such as the Internet.The WTRUs 502, 504 may include respective cameras/sensors 508, 510 andrespective displays 512, 514. The WTRUs 502, 504 may execute respectivevideo telephony applications 516, 518. The video telephony applications516, 518 may be stored in memory devices. The video telephonyapplications 516, 518 may include respective video decoders 520, 522 andrespective video encoders 524, 526.

As shown in FIG. 5, the camera/sensor 508 of the WTRU 502 may signal oneor more viewing parameters to the WTRU 504 (e.g., the video encoder 526of the WTRU 504). The camera/sensor 510 of the WTRU 504 may signal theone or more viewing parameters to the video encoder 524 of the WTRU 502.The one or more viewing parameters may be signaled from a first WTRU toa second WTRU. The one or more viewing parameters may be used to encodevideo using a video encoder of the second WTRU. The one or more viewingparameters may include, but not limited to, for example, camerasettings, focus distance, aperture settings, and/or additional sensordata, such as ambient illuminance, accelerometer data, proximitydetection, etc. The one or more viewing parameters may be signaled foruse in rate selection and/or perceptual pre-filtering at the remote WTRU(e.g., an encoder at the remote WTRU).

Signaling viewing condition information (e.g., one or more viewingparameters) in addition to a video stream may reduce latency because thevideo encoding and/or decoding is not in the signaling path. Signalingone or more viewing parameters may improve accuracy of facial detection.For example, a sending WTRU may perform local facial detection withhigher accuracy than may be possible at a receiving WTRU (e.g., in orderto determine parameters such as user presence, user attention to thescreen, and/or distance of user to screen, among other reasons). Thesending WTRU may transmit one or more viewing parameters to thereceiving WTRU.

One or more of the viewing parameters described herein, including thoseviewing parameters described and shown in FIG. 5, may be signaled via anetwork.

FIG. 6 illustrates an example video telephony system 600 comprisingWTRUs 602, 604 in communication with one another via a network 606, suchas the Internet. The WTRUs 602, 604 may include respectivecameras/sensors 608, 610 and respective displays 612, 614. The WTRUs602, 604 may execute respective video telephony applications 616, 618stored in memory devices. The video telephony applications 616, 618 mayinclude respective video decoders 620, 622 and respective video encoders624, 626.

As shown in FIG. 6, the camera/sensor 608 of the WTRU 602 may signalviewing condition information (e.g., one or more viewing parameters) tothe video encoder 626 of the WTRU 604. The camera/sensor 610 of the WTRU604 may signal viewing condition information to the video encoder 624 ofthe WTRU 602. The one or more viewing parameters may include, themaximum resolvable spatial frequency information and/or perceivablecontrast ratios. The one or more viewing parameters may be used in rateselection and/or perceptual pre-filtering at the remote WTRU (e.g., theencoder at the remote WTRU). The one or more viewing parameters mayinclude compact representations. The sender WTRU may process sensor datainto maximum spatial frequency and perceivable contrast ratio.

The maximum resolvable spatial frequency and/or the perceivable contrastratio or sensor data may be communicated in-band as part of theapplication layer protocol, or may be included as extensions to theSession Initiation Protocol (SIP), Session Description Protocol (SDP),and/or Real-Time Control Protocol (RTCP). For example, RTCP ReceiverReports (RRs) may include information regarding the viewing conditionsat the receiver WTRU. The signaling may be sent, for example, over thenetwork, among other scenarios.

FIG. 7 illustrates an example video telephony system 700 comprisingWTRUs 702, 704 in communication with one another via a network 706, suchas the Internet. The WTRUs 702, 704 may include respectivecameras/sensors 708, 710 and respective displays 712, 714 and mayexecute respective video telephony applications 716, 718 stored inmemory devices. The video telephony applications 716, 718 may includerespective video decoders 720, 722 and respective video encoders 724,726.

The one or more viewing parameters may include camera settinginformation. The camera setting information may be signaled (e.g., toimprove interpretation of the video data for synthesis into maximumresolvable spatial frequency and/or perceivable contrast ratio). Thecamera setting information may include user distance and/or illuminance.The camera setting information may be used for synthesis into maximumresolvable spatial frequency and contrast ratio at the remote WTRU(e.g., an encoder at the remote WTRU). FIG. 7 depicts an example flow ofsensor information (e.g., the one or more viewing parameters) for thevideo telephony system 700.

The bit rate of the video encoding may be adjusted (e.g., to avoiddelivering information that cannot be perceived by the end user). Bitrate selection may be driven by one or more viewing parameters,including, for example, the maximum spatial frequency that can beresolved by the receiving user, the maximum contrast ratio that isperceivable by the receiving user, and/or the attention of the receivinguser. The attention of the receiving user may be based, for example, oneye tracking.

The one or more viewing parameters may include display characteristics.The display characteristics may be included in a device viewingparameter. The display characteristics may be signaled. The displaycharacteristics may establish the limits on spatial frequency acuity.The display characteristics may include the size of the receiver'sdisplay, aspects of its maximum contrast ratio, and/or details of itsmaximum illuminance. The video stream may be modified based on thedisplay resolution of the receiving WTRU (e.g., to avoid transmission ofspatial frequencies that cannot be reproduced by the display of thereceiving WTRU). The display characteristics may be exchanged as part ofcall setup using extensions to SDP. The display characteristics may beexchanged as part of the SIP protocol exchange. The displaycharacteristics may change dynamically, for example, when the sessionswitches from two-party to multi-party, or when there is an orientationswitch from portrait to landscape or vice versa. The functionalrelationship between the screen tilt and the contrast ratio may be usedto determine the spatial frequency acuity. The display characteristicsmay enable synthesis of maximum perceivable spatial frequencyinformation at the receiver WTRU.

There may be a tradeoff between bandwidth and latency. Users (e.g.,video telephony clients) individually may be able to trade between thelatency and the offered load. Collectively, users may influence thetradeoff between the latency and the offered load. The offered load maybe reduced by reducing the maximum spatial frequencies transmitted basedon what is perceivable by a user at the receiver WTRU. The offered loadmay be reduced to provide a lower latency video stream. The lowerlatency video stream may have a larger impact on the users perceivedQuality of Experience than a reduced resolution. The rate selection maybe determined to balance between the latency and the offered load (e.g.,by considering the impacts of both local source coding distortion anddistortion due to packet loss).

Savings in power consumption at the sender may be achieved by reducingthe video capture resolution and/or frame rate, lowering the videoencoding resolution, and/or reducing the quality of the video encoding(e.g., using a single-pass vs. 2-pass encoding).

The visible cut-off-frequency (e.g., a spatial frequency boundarybetween perceptible and imperceptible details) may be determined fromremote viewing conditions (e.g., using a CSF model of human visualperception). The visible cut-off-frequency may be used to controlpre-filtering. Pre-filtering may remove detail that might not be visibleto the viewer at the remote device. Video may be captured and/or encodedat a resolution that may be determined by the local camera with thevideo filtered (e.g., before encoding to remove such detail). In someembodiments, system complexity may be reduced. The resolution of thecapture and/or encoding may be reduced based on the cut-off-frequency(e.g., such that the lower resolution may still represent the amount ofdetail that may be visible to the viewer at the remote device).

For example, when the cut-off-frequency, f, is less than ½, the imagemay be reduced in resolution by a factor of ½/f. The cut-off-frequency,f, may be expressed in the units of the original resolution. Forexample, if f is ⅙, the resolution may be reduced by a factor of 3. Aresolution downscaling factor may be selected at a power of 2. As anexample, if the cut-off-frequency is less than ¼ the resolution ofcapture and/or encoding may be reduced by a factor of 2. Thecut-off-frequency may be expressed in the reduced (e.g., updated)resolution. Filtering (e.g., applying a pre-filter) may removeadditional detail with the modified cut-off-frequency.

The screen (e.g., the display) may be partitioned such that the remoteuser may see multiple participants. For example, the display may bepartitioned when more than two users participate in a video telephonysession. The spatial resolution of the outgoing video may be reducedbased on knowledge of the partitioned receiver display (e.g., to achievesubstantial network resource savings). The consumption of networkresources may be reduced by considering the smaller portion of thedisplay in computing the maximum spatial frequency that may be perceivedas well as the contrast ratio and reduction in display luminousemittance.

Devices (e.g., WTRUs) in a video telephony session may exchangeinformation about their respective orientations. The image (e.g., videoand/or picture) capturing and/or processing may be adapted (e.g.,modified) according to the orientation of the remote device (e.g., sothat the user experience of the displayed video and picture may beoptimized). Video capturing and/or video processing may be adapted(e.g., modified) to the display orientation of a remote device (e.g., aWTRU that is receiving the video that is being captured and/orprocessed). FIG. 8 depicts an example video telephony system 800 inwhich WTRUs 802, 804 are in communication with one another via acommunication network 806, such as the Internet. In the video telephonysystem 800 of FIG. 8, local video capturing may be based on theorientation of the remote device. For example, video capturing at WTRU802 may be based on the orientation of WTRU 804, and video capturing atWTRU 804 may be based on the orientation of WTRU 802. By basing localvideo capturing on the orientation of the remote device, the issuesillustrated in FIGS. 3 and 4 may be reduced or eliminated.

For a given orientation of the display of a device (e.g., a WTRU)relative to an observer, there may be several desired ways to displaythe video at the receiving device (e.g., a receiving WTRU), as shown inFIGS. 9A-9D. For example, FIG. 9A illustrates a receiving WTRU 902displaying a video using as much of the display (e.g., screen) aspossible, with the up direction along the length or the width of thescreen. FIG. 9B illustrates a receiving WTRU 904 displaying a videousing as much of the screen as possible and, with the up directiondetermined (e.g., uniquely determined) by the eyes of the receiving WTRU904 user (e.g., regardless of the orientation of the receiving WTRU).FIG. 9C illustrates a receiving WTRU 906 displaying a video using theentire screen, with the up direction either along the length or thewidth of the screen. FIG. 9D illustrates a receiving WTRU 908 displayinga video using the entire screen, with the up direction determined (e.g.,uniquely determined) by the eyes of the receiving WTRU 908 user (e.g.,regardless of the orientation of the receiving WTRU).

Video stream modifications (e.g., adaptations) may be performed at thesending device (e.g., at the sender side). Video stream modificationsmay include one or more of the following: cropping, downsizing, downsampling, zooming, or adaptive video capturing. FIGS. 10A-10B illustratean example of sender-side cropping. The sender-side WTRU may capture thevideo and may crop the video based on one or more viewing parameters ofthe receiver WTRU 1002, 1004 (e.g., according to the video format thatis the best for a receiver-side WTRU 1002 or 1004). The sender-side WTRUmay encode the cropped video, as shown in FIGS. 10A-10B. As shown inFIG. 10B, the sides of the cropped region may not be parallel to norperpendicular to the sides of the captured picture (e.g., of the video).

FIGS. 11A-11B illustrate an example of sender-side downsizing or downsampling. Video capturing may be based on the display orientation of asender-side WTRU. Based on the orientation of the receiver WTRU, thecaptured pictures (e.g., of the video) may be modified (e.g., downsizedor down sampled) to fit the display of a receiver-side WTRU 1102 or1104.

A device may employ adaptive video capturing. A subset (e.g., a propersubset) of the image sensors may generate (e.g., be selected togenerate) the pictures (e.g., the video) for video capturing. The subsetmay be determined based on the orientation of the receiver WTRU (e.g.,the display of the receiver WTRU). The pictures may have the sameresolution as the resolution of the display of the receiver WTRU. Theresolution of the image sensor array may be higher than the resolution(e.g., the video resolution) at the display of the receiver WTRU. Thesubset of image sensors may be selected based on the resolution of thereceiver WTRU (e.g., the display of the receiver WTRU). FIGS. 12A-12Billustrate an example of image sensor selection.

For example, the sender-side WTRU may have an electronically controlledvariable focal length lens (e.g., a digitally controlled mechanicalzoom). FIG. 12A depicts a subset of the pixels in an image sensor 1202being utilized. The subset of the pixels may be satisfactory if the useris satisfied with the Field of View (FOV) and the display resolutionapproximately matches the image capture resolution. Electronicallyadjusting the focal length of the lens (e.g., by zooming out to use moreimage sensors) may utilize (e.g., fully utilize) the image sensorresolution and may maintain the same FOV (e.g., if the display has morepixels than the subset of image sensor pixels), as shown in FIG. 12B.

Far end (e.g., remote) camera control may be utilized (e.g., if a WTRUis conferencing with a traditional video conferencing client) to adjustthe far end camera. The far end camera may be adjusted (e.g., adjustedas appropriate) for the mobile display.

FIG. 13 illustrates an example in which an image sensor array 1302 maybe rotated. The image sensor array 1302 may be rotated (e.g., viaelectrical mechanical devices) to match the orientation of the receiver(e.g., the display of the receiver). The image sensor array 1302 may berotated, for example, to utilize each image sensor in the videocapturing process. Rotation of the image sensor array 1302 may yield thehighest possible resolution of the video when there is sufficientbandwidth in the communication network.

A WTRU (e.g., the video receiver) may send orientation information toanother WTRU (e.g., the video sender). The orientation information mayinclude, but not limited to, the height and width of the desired videopictures and/or the up direction relative to the video picture. Forexample, the up direction may be an angle A relative to the width, asshown in FIG. 14. The orientation information may specify an angularorientation of the picture. For example, the direction might not encode(e.g., explicitly encode) the “up” direction. The direction may encode a“down” direction, a “left” direction, and/or any other known orientingdirection relative to the picture. “Up direction” may establish anorientation of a video picture and/or of a device display.

A device (e.g., the video sender) may determine its own up direction(e.g., after receiving the orientation information). The device maydetermine the picture that it needs to capture. For example, the widthmay be in the direction −A, and the height may be in the direction(90−A)^(°). The device may receive a height and width of the desiredvideo pictures. The device may determine how many pixels may be in thewidth direction and/or the height direction based on the received widthand height from the video receiver.

The video receiver may determine the width and height and/or updirection by measuring the location of the eyes relative to the display.

The video receiver may detect the direction of an eye-axis 1502 of theuser, as shown in FIG. 15. For example, the video receiver may analyzevideo captured using a front camera of the device. The video receivermay detect a face and/or detect eye positions in the captured video. Thevideo receiver may determine the direction of the eye-axis 1502 based onthe eye positions.

The video receiver may determine the up direction for the video to bedisplayed. The video receiver may project the eye-axis 1502 of the useronto a display plane. FIG. 16 illustrates a projection 1602 of theeye-axis 1502 onto the display plane. If the angle of this projection1602 relative to the x-axis is B°, the up direction of the video to bedisplayed may be determined, for example, either as (B+90)° or as afunction ƒ((B+90)°). The function ƒ may be a quantization function thatmay be defined, for example, as follows:

${f(z)} = \left\{ {\begin{matrix}{0^{{^\circ}},{{\&\; {- 45^{{^\circ}}}} < z \leq 45^{{^\circ}}}} \\{90^{{^\circ}},{45^{{^\circ}} < z \leq 135^{{^\circ}}}} \\{180^{{^\circ}},{135^{{^\circ}} < z \leq 225^{{^\circ}}}} \\{270^{{^\circ}},{255^{{^\circ}} < z \leq 315^{{^\circ}}}}\end{matrix}.} \right.$

The up direction of the video may be determined as (B+90)°. For example,if the up direction is uniquely determined by the eyes, regardless ofthe orientation of the receiver, the up direction may be determined as(B+90)°, as shown in FIGS. 9B and 9D. The up direction of the video maybe determined using a quantization function. For example, the updirection may be determined using a quantization function if the updirection is either along the length or the width of the screen, asshown in FIGS. 9A and 9C. Determination of the “up” direction may bebased, for example in part, on orientation sensors that may be presentin the receiver device. For example, orientation sensor readings maysupplement eye-axis tracking (e.g., during periods where the eyetracking algorithm might not reliably determine eye positions).

FIG. 17 illustrates an example call flow 1700 for capturing videolocally according to the orientation of a remote device. The orientationof the remote device may be used to control the local video capturing(e.g., video capturing at a local device). WTRUs 1702, 1704 may includerespective sensors 1706, 1708 for detecting the respective orientationsof WTRUs 1702, 1704. The sensors 1706, 1708 may sense, for example,gravity and/or acceleration.

The orientation of the remote device may be between a portraitorientation and a landscape orientation. For example, the orientationmay have three degrees of freedom in a three-dimensional space. Thesensor 1706 or 1708 may estimate the orientation of the WTRU 1702, 1704.The estimate may be used to determine which display format (e.g.,portrait or landscape) is best given the orientation (e.g., actualorientation) of a display 1710, 1712. The orientation may be a binaryclassification, e.g., the orientation of a display may be classified aseither portrait or landscape even though the actual orientation of thedisplay is somewhere between portrait (e.g., purely portrait) andlandscape (e.g., purely landscape). Detection of the orientation of thedisplay by the sensor may use the outcome of this binary classification.

At 1714, if at time t₀, the orientation of WTRU 1702 is portrait and theorientation of WTRU 1704 is landscape, the sensor 1706 of WTRU 1702 maydetect the portrait orientation and may send the information to aprotocol stack 1716 of WTRU 1702. At 1718, the information about theorientation of WTRU 1702 may be sent to WTRU 1704. At 1720, a protocolstack 1722 of WTRU 1704 may inform a camera 1724 of WTRU 1704 of theorientation of WTRU 1702. At 1726, the camera 1724 may capture videoaccording to the orientation of WTRU 1702 (e.g., portrait), and may sendthe video to the protocol stack 1722. At 1728, the protocol stack 1722may send the encoded video in the orientation of WTRU 1702 (e.g.,portrait), to the protocol stack 1716 of WTRU 1702. At 1730, theprotocol stack 1716 of WTRU 1702 may decode the video and may send thedecoded video to the display 1710 of WTRU 1702. The decoded video mayagree with the orientation of WTRU 1702.

As shown in FIG. 17, similar processes may be performed, with the rolesof WTRUs 1702, 1704 reversed, for detecting the orientation of WTRU1704, e.g., landscape, and using the detected orientation of WTRU 1704to control the capture of video by a camera 1732 of WTRU 1702.

At 1734, if at time t₁, the orientation of WTRU 1704 has changed fromlandscape to portrait, the sensor 1708 may detect the change inorientation and may inform the protocol stack 1722 of WTRU 1704 of thechange in orientation. The protocol stack 1722 may inform the WTRU 1702of the change in orientation at 1736. At 1738, the protocol stack 1716of WTRU 1702 may inform the camera 1732 of WTRU 1702 of the change. At1740, the camera 1732 may send the captured video, which may be inportrait format, to the protocol stack 1716 of WTRU 1702. At 1742, WTRU1702 may send the video (e.g., with the new portrait format) to WTRU1704. At 1744, the protocol stack 1722 of WTRU 1704 may decode and sendthe video, in portrait format, to the display 1712 of WTRU 1704. Thedecoded video may agree with the orientation (e.g., the new portraitorientation) of WTRU 1704.

The camera may capture video according to its local orientation and maycrop the captured video according to an orientation of a remote device.The cropping may take place at a preset region of the captured video.The cropping may cover the region of interest, e.g., a human.

Adaptation to viewing conditions may be performed in multi-party videoconferencing, e.g., using a single camera with multiple participants. Inan embodiment, the adaptation to viewing conditions may be determined bythe closest user to the display or by the user with the most stringentrequirements. In an embodiment, the adaptation to viewing conditions maybe determined based on the closest attentive user. The adaptation toviewing conditions may be determined based on a subset of users (e.g.,users who are paying attention to the video). For example, if there is auser who is close to the display but is not viewing the display asdetermined by face detection logic, adaptation may be determined basedon the next closest user who is determined to be viewing the display.

In multi-party video conferencing, different bit rates may be allocatedto different participants. Relative priority may be assigned statically.For example, the CEO always gets the most bandwidth in a businessconferencing application. Relative priority may be assigned dynamically(e.g., based on an activity or lack of activity, such as speaking or notspeaking). A speaking participant may be allocated more bandwidth (e.g.,more video bandwidth) than a non-speaking participant. A point ofattention of one or more receivers (e.g., receiver devices) may be usedto allocate bandwidth. Speech activity may be used to determinebandwidth priority. An active speaker may be selected and routed to theothers (e.g., by a control bridge). Hard switching may be replaced withuser adaptive options based on, for example, voice activity or othercriteria.

For example, one or more devices (e.g., users) may transmit video to theMCU. An MCU may select video from at least one device (e.g., a fewdevices) to broadcast. The MCU may broadcast a mix of the video from theselected devices into a single screen. The devices may be selected basedon voice activity detection, for example. The MCU may influence the oneor more transmitting devices such that the video sent from the selecteddevice (or the selected few devices) may be sent to the MCU at a higherquality (e.g., a higher bit rate or higher resolution) than the othertransmitting devices. The MCU may influence the encoding of the one ormore transmitting devices using signaling (e.g., a message requestingthe sender to change its sending bit rate). The MCU may indicate (e.g.,signal) a bit rate for the one or more transmitting devices. The MCU mayinfluence the bit rate of the one or more transmitting devices usingtraffic shaping techniques and/or feedback “tricking” techniques (e.g.,setting artificial conditions and/or values, perhaps in feedback sent toa client).

Video available from one or more, or all, devices may enable lowerlatency switching. The MCU may select at least one video stream (e.g.,the video stream from the current speaker) for broadcast to one or more(e.g., all) devices. The device of the selected video stream (e.g., thecurrent speaker) may receive a video stream from another device (e.g., aprevious speaker). The MCU may form a composite image of one or moredevices which may be broadcast to one or more devices (e.g., conferenceparticipants). The MCU may send (e.g., forward) one or more selectedvideo streams (layers may be scalably coded) to one or more (e.g., each)client. The client may arrange the sent video streams for displaylocally. One or more (e.g., all) video streams may be sent to one ormore (e.g., all) devices. The one or more devices configures the displaylocally based on the received video streams. The video stream from thedevice of the current speaker may be sent to other devices, perhaps forexample while video from a different device may be sent to the device ofthe current speaker.

Detection of user gaze may be used to control rate allocation withindifferent portions of a frame. The region within a frame at which anindividual is looking may be localized to improve the quality in thatregion. For example, the viewer may be focused on the middle of thescreen when he or she is attentive.

The MCU may include user adaptive video functionality. The MCU maymonitor the video streams that may be produced by one or more, or each,endpoint (e.g., device). FIG. 18 shows an example of UAV processingbeing applied in an MCU 1800 for a single connection. The video streamfrom Client 1 1810 may be analyzed by the UAV-1 Analysis module 1820 ofthe MCU 1800 (e.g., to estimate viewing conditions for Client 1 1810).The analysis may be used to determine a viewing parameter (e.g., aviewing condition) associated with Client 1 1810. Analyzing the videostream to determine a viewing parameter may include analyzing videopacket data, video bit stream data, side information and/or controlsignaling associated with the video stream. Control signaling may beconveyed, for example, using SIP messages. H.245 messages, HTTPrequests/responses, and/or the like. The viewing parameter may be usedto control the video processing transcoding and/or video mixing processof the video sent to Client 1 1810. The UAV may be used for more than asingle link, or for all connections in the MCU. The MCU may determineone or more viewing parameters for one or more, or each, endpoint.

The MCU may determine as many viewing parameters as possible from thevideo stream (e.g., without requiring additional signaling). Viewingparameters (e.g., viewing parameters which may be relatively more usefulthan other viewing parameters) may include viewing distance, scaled bydisplay size, and/or ambient lighting level, among others. Other viewingparameters (e.g., viewing parameters which may be derived from analysisof the video) may include user presence, user attentiveness, and/ormotion of the display relative to the user, among others. Any of theseparameters or any combinations thereof may be used for adapting videosent by the MCU to a device. One or more viewing parameters may bedetermined by face detection applied to the video stream received fromthe device. When no face is detected, a viewing parameter may include aconservative viewing distance. A conservative distance may be as small adistance as might be practically used (e.g., a smallest practicaldistance to view a device of a specific type). At farther distances UAVmay be more aggressive in removing detail, so a conservative distancemay be chosen, for example to preserve more detail, among other reasons.A common “close” viewing distance of 3 picture heights of the displaymay be used for the conservative distance. Expressing the distance inunits of picture height scales with display size and may be consistentwith the distance use in UAV.

The viewing distance may be determined based on a situation where thecamera may be located at the same distance from the user as the display.The MCU may analyze the video (e.g., a video stream) from a sender(e.g., a sender device) to determine the user-to-camera distance. TheMCU may determine user-to-display distance based on the user-to-cameradistance. Determination of user-to-camera distance from the analyzedvideo may be based on face detection. Determination of user-to-cameradistance may utilize knowledge of the camera setup on the sender device.For example, the MCU may identify a sender's device type (e.g.,manufacturer, model, or other device identifier) using signaling or userprofile information. The MCU may identify a software video-telephonyclient in use by the sender. The software video-telephony client mayconfigure the camera, and/or may be known to perform imageresizing/rescaling in a particular way. Signaling between the senderdevice and the MCU may include identifiers for the sender device and/orthe video-telephony client software. For example a “User Agent” field oranother similar field may identify the manufacturer, the model number,the device type, and/or the software client. The MCU may retrieve deviceviewing parameters of the camera setup which may correspond to thesending device and/or the software client from a database. The deviceviewing parameters may be based on identification of the sender deviceand/or the video-telephony software client, among other factors. Thedevice viewing parameters (e.g., properties of the sender device cameraand/or the typical camera configuration or scaling used by the sendersoftware client) may be used together with face detection, perhaps forexample to determine the user-to-camera distance and/or theuser-to-display distance, among other determinations.

One or more viewing conditions (e.g., viewing parameters) may bedetermined and/or estimated for at least one endpoint (e.g., device).The video sent from the MCU may be modified (e.g., to remove visuallyredundant information). The video sent from the MCU may be modifiedbased on the viewing conditions estimated for the endpoint (e.g., fromanalysis of video sent from that endpoint to the MCU), among otherfactors. The MCU may include direct transcoding techniques and/orindirect traffic shaping techniques.

Device types and/or calling scenarios may be used to estimate theuser-to-display distance. As an example, at least three usage modes canbe considered for illustration: a conference room, an individual fixeddevice (e.g., PC), and an individual handheld device. The conferenceroom mode may include at least one preferred distance. The preferreddistance may be based on typical camera-to-user distance used inconference rooms. The preferred distance may be based on a specificconference room setup. The conference room usage mode may be determinedby detecting a number of faces in an image (e.g., multiple faces maysuggest a conference room scenario). The MCU may track (e.g., detect)zooming operations (e.g., as a conference room video conferencing systemmay support camera zoom). The MCU may detect zooming operations byanalyzing changes in the background details present in the image (e.g.,the video image). A “typical” or “normal” viewing distance for an officePC may be used, for example perhaps if the conditions suggest anindividual fixed device (e.g., a single face detected, little or nocamera motion). The viewing distance may be estimated based on therelative size of a face in the image, for example perhaps if theconditions suggest a handheld device (e.g., single face detected, cameramotion due to non-stationary device. The instability of a scene may beused to infer a handheld device is in use (e.g., perhaps rather than afixed camera as with a conference room or PC and web cam). The usagemode may be determined based on a codec description during call setup.The codec description may enable low (e.g., extremely low) complexityimplementation (e.g., when combined with traffic shaping discussedherein).

The MCU may be in the cloud (e.g., the Internet). The MCU may bereplaced by a device (e.g., a more generic device) that may perform thefunctions of an MCU such as call management and/or transcoding, andother functions that an MCU might not do. There may be no limitation onthe clients, which may be running WebRTC or other video-telephonysoftware.

A device may control a video communication via transcoding. The devicemay include a multipoint control unit (MCU). The MCU may modify a videostream from one participant based on the view parameter(s) associatedwith the other participants. For example, the device may receive a firstvideo stream from a first device and a second video stream from a seconddevice. The device may receive a third video stream from a third device.The device may receive a fourth video stream from the second device. Thedevice may analyze the first video stream to determine a first viewingparameter associated with the first device. The device may analyze thesecond video stream to determine a second viewing parameter associatedwith the second device. The device may analyze the third video stream todetermine a third viewing parameter associated with the third device.The viewing parameter may include a user viewing parameter, a deviceviewing parameter, and/or a content viewing parameter. The device maymodify the second video stream based on the first viewing parameterand/or the third viewing parameter. The device may modify the firstvideo stream based on the third viewing parameter and/or the secondviewing parameter. The device may modify the fourth video stream basedon the third viewing parameter. Modifying the video stream may includere-encoding the video stream, adjusting an orientation, removing a videodetail, filtering, and/or adjusting a bit rate. The device may send themodified second video stream to the first device and/or the thirddevice. The device may send the modified first video stream to thesecond device. The device may send the modified fourth video stream tothe first device and/or the third device. The device may compare bitrates associated with the first viewing parameter and the third viewingparameter. The device may compare bit rates associated with one or moreviewing parameters at predetermined time intervals. The device maycompare bit rates associated with one or more viewing parameterscontinuously. The device may compare bit rates associated with one ormore viewing parameters when prompted. The device may modify one or morevideo streams based on the viewing parameter corresponding to a moststringent quality requirement. The most stringent quality requirementmay be determined based on the one or more viewing parameters. Forexample, when the third viewing parameter is associated with a higherbit rate than the first viewing parameter, the device may modify thefourth video stream based on the third viewing parameter.

Viewing conditions (e.g., distance information and/or ambientillumination) may be used to adjust (e.g., reduce) the bit rate of avideo stream(s) produced by the MCU. The viewing conditions may includeone or more viewing parameters. The MCU may employ an active transcodingand/or encoding solution (e.g., for rate adaptation and/or continuouspresence functionality, among other scenarios). UAV via MCU transcodingmay support a number of N (e.g., an integer value) clients in a call. Nmay be greater than two. The MCU may analyze the video it receives fromone or more, or each, client (e.g., to determine user adaptive viewingparameters for the client, among other reasons). The MCU may modify avideo stream based on one or more of the determined viewing parameters.The determined viewing parameters may also be referred to as useradaptive viewing conditions.

UAV viewing parameters may include face detection to determineuser-to-display distance. FIG. 19A illustrates an example MCU 1900 witha decoder 1902, 1904, a face detector 1906, 1908; and an encoder 1910,1912 for each client 1914, 1916 (e.g., device). The encoder 1910, 1912may be controlled by a face detector 1906, 1908. The face detector 1906,1908 may detect the video received from a device (e.g., the client 1914,1916). The face detector 1906, 1908 may control the encoder 1910, 1912when a separate encoder is used for one or more, or each, device, suchas illustrated in FIG. 19A. For example, viewing parameters such asviewing distance, user presence, user attentiveness, ambientilluminance, etc., may be derived from the received, decoded videostream. The encoder 1910, 1912 may receive one or more viewingparameters from a device. The encoder 1910, 1912 may encode video basedon the one or more viewing parameters. The encoder 1910, 1912 may sendthe encoded video to the device. The encoder 1910, 1912 may removedetails (e.g., details which are not likely to be perceptible to a userof the device) from the encoded video. The encoder 1910, 1912 may adjust(e.g., reduce) the bit rate of the video stream transmitted to thedevice.

FIG. 19B illustrates an example of an encoder configuration where videofrom one client may be sent to multiple receiving clients (e.g.,devices). The MCU may receive video content (e.g., a video stream) frommultiple clients. The MCU may send the video content to the multipleclients (e.g., simultaneously). One or more, or each client may havelocal viewing conditions. The MCU may analyze the video content receivedfrom one or more, or each, of the multiple video clients. The MCU maydetermine one or more UAV viewing parameters based on the analysis ofthe video content. The one or more UAV viewing parameters may describethe local viewing conditions at one or more, or each, of the clients.The UAV viewing parameters may include viewing distance, user presence,user attentiveness, and/or ambient illuminance, etc. The MCU may modify(e.g., adapt) the video content sent to a client based on the one ormore UAV viewing parameters which describe local viewing conditions atthe client. The MCU may determine how to encode the video content to besent to a client based on one or more UAV viewing parameters from theclient.

FIG. 19B highlights an example particular video path to illustrate theadaptation of video to multiple clients. Client #1 may send videocontent in the form of a sent video stream (e.g., a video bit stream) tothe MCU. The MCU may decode the video content from Client #1. The MCUmay analyze the video content. The MCU may determine one or more UAVviewing parameters (e.g., distance between user and display) associatedwith Client #1 by analyzing the video content. The MCU may receive videocontent sent from other clients (e.g., Client #2, Client #3, etc.). TheMCU may receive the video content from other clients at the same time itreceives the video content from Client #1. The MCU may analyze the videocontent from the other clients. The MCU may determine one or more UAVviewing parameters, which may describe the local viewing conditions atthe other clients, by analyzing the video content from the otherclients. The MCU may utilize the determined UAV viewing parameters forthe other clients to encode the video content from Client #1 (e.g., inorder to adapt the video content to the local viewing conditions at theother clients, among other reasons). As illustrated in FIG. 19B, UAVviewing parameters, determined from analysis of the video content fromClient #2, may be passed to an encoder which may encode the videocontent from Client #1. The UAV viewing parameters may be used to adapt(e.g., modify) the video content of Client #1 to the local viewingconditions at Client #2. UAV viewing parameters determined from analysisof the video content from Client #3 may be passed to an encoder whichmay encode the video content from Client #1. The UAV viewing parametersmay be used to adapt (e.g., modify) the video content of Client #1 tothe local viewing conditions at Client #3. Video adaptation (e.g.,modification) may be extended to any number of clients which may besending video to and/or receiving video from the MCU.

FIG. 20 depicts an example of an MCU with video mixing and a sharedencoder. The shared encoder may be a single encoder. The shared encoder(e.g., a single encoder) may be shared among a variety of endpoints, asshown in FIG. 20. The shared encoder may be used with a continuouspresence video mixer. The MCU may include a face detector. The MCU mayinclude a face detector for each endpoint (e.g., each connected clientendpoint) such that faces may be detected in any or all of the videosteams received by the MCU. The shared encoder may receive input fromthe face detector corresponding to an endpoint. The shared encoder mayreceive input from one or more, or all, face detectors corresponding toendpoints which may receive the video. The face detector correspondingto an endpoint may determine a viewing parameter based on the viewingconditions at the endpoint. The encoder may modify the video based onthe viewing parameter. The encoder may modify the video based on a“worst case” selection amongst viewing conditions of multiple endpoints(e.g., to reduce bitrate without impacting the quality perceived by themost critical viewer, among other reasons). For example, the viewingdistances from one or more, or all, the clients receiving the encodedvideo content may be estimated using face detection. The smallestestimated viewing distance may be used to adapt (e.g., modify) the video(e.g., the video encoding process).

One or more clients may provide UAV viewing parameters to the MCU. TheUAV viewing parameters may include viewing distance, user presence, userattentiveness, ambient illuminance, and/or display properties such asdisplay size and/or display resolution, among others. The UAV viewingparameters may be signaled from the client to the MCU. The UAV viewingparameters may be signaled using a call setup protocol and/or a callcontrol protocol (e.g., H.245, SIP, etc.). The MCU may use a UAV viewingparameter to adapt (e.g., modify) the encoding of video content sent tothat client. The MCU may modify the video content when UAV viewingparameters are explicitly signaled by a client. The client may sendviewing parameters associated with face detection to the MCU.

The MCU may perform one or more of the orientation adaptation techniquesdescribed herein. The MCU may act as a video sender and/or may performthe adaptation tasks attributed to the sender. A video receiver clientmay receive video from the MCU. The video receiver client may sendorientation information (e.g., one or more viewing parameters) to theMCU. The orientation information may include the height and/or width ofthe video pictures which the video receiver may find useful to receive.The orientation information may include an “up direction” for the videopicture. The MCU may analyze the video (e.g., the features of video)received from a client to infer the orientation information of thatclient. The MCU may infer the orientation information without explicitsignaling of the orientation information from that client. For example,the MCU may calculate the angle between a door frame and a lengthdirection of the video. The MCU may adapt (e.g., modify) video sent tothe video receiver client using the various techniques described herein(e.g., sender-side cropping, sender-side down-sampling, imageresampling, image rotation, and/or the like). The MCU may modify thevideo based on the orientation information. The MCU may adapt (e.g.,modify) the orientation of video content received from a sending client.The MCU may modify the orientation of video content before sending thevideo content on to a receiving client. The MCU may tailor (e.g.,individually tailor) the orientation adaptation to one or more of themultiple clients. The MCU may tailor the orientation adaptation based onorientation information received from one or more of the multiple videoclients.

A device may control a video communication via traffic shaping. Thedevice may include an MCU. The device may receive a first video streamfrom a first device and a second video stream from a second device. Thedevice may determine a viewing parameter associated with the firstdevice by analyzing the first video stream. The viewing parameter mayinclude a user viewing parameter, a device viewing parameter, and/or acontent viewing parameter. The device may determine, based on theviewing parameter, a video stream bit rate for the second video stream.The device may indicate the video stream bit rate to the second device.The device may indicate the video stream bit rate by removing one ormore packets from the second video stream before sending to the firstdevice.

FIG. 21 illustrates an example of a technique in which the MCU mayinfluence the encoding rate of one or more sending clients. The MCU maydetermine one or more UAV viewing parameters for a client. The MCU maydetect viewing condition information (e.g., the viewing position) of theN^(th) client based on analysis of the video stream from the N^(th)client. N may refer to a connected client. The MCU may determine (e.g.,compute) the viewing distance of the N^(th) client by monitoring thevideo from the N^(th) client. The viewing distance may influence theencoder, which may be encoding video sent to the MCU, to implement UAV.The encoder may not be in the MCU. The encoder may be part of a clientendpoint that sends video to the MCU. The MCU may switch video trafficwith or without transcoding the video streams. The MCU may monitor thevideo traffic to detect faces, as illustrated in FIG. 21. The MCUdecoded image might not be seen, perhaps except by the face detector,which may permit complexity reductions in the decoder (e.g., whereappropriate). For example, the face detector might not operate at thefull frame rate at which the sender encoded the video, in which case thedecoder that provides decoded video to the face detector, may operate atless than the full frame rate.

The decoder may decode intra-coded frames (e.g., frames withoutprediction in time). In an embodiment, the decoder may decode onlyintra-coded frames. The video stream may have various layers. Thedecoder may decode a subset of the full video (e.g., a reduced framerate or resolution). The decoder may decode a subset of the full video.The detected viewing condition information may be signaled directly orindirectly to the encoding client device (e.g., the client deviceencoding the video). The viewing condition information may influence anencoder at another client. The viewing condition information may besignaled directly or indirectly to the encoder at another client. Thei^(th) client may adjust (e.g., modify) its encoded bit rate based on anobserved channel bandwidth. The bandwidth use of the encoder of thei^(th) client may be influenced by shaping the measurements on the videotraffic originated from the i^(th) encoder. The stream selection logicmay control which clients are connected to which other clients. Thestream selection logic may control which video streams the MCU routes towhich clients. The encoded output of the i^(th) client may be seen byone or more (e.g., several) other clients, say C={j₁, j₂, . . . ,j_(k)}. For one or more, or each client, i, the MCU may monitor (e.g.,determine) viewing distances of clients in C that this client's videomay be sent to. The traffic originating from the i^(th) encoder may beshaped to correspond to the bandwidth reduction corresponding to thenearest of the clients in C.

The MCU may shape the traffic using one or more of the following:

The MCU may “trick” a video sender to decrease a sending rate:

The MCU may throttles the throughput;

The MCU may intentionally drop, mark and/or delay packets; and/or

The MCU may employ signaling to instruct the sender to change itssending bitrate.

FIG. 22 illustrates an example logical connection among videoconferencing participants and the MCU. With reference to FIG. 22, themedia data may be sent from a sender S to the MCU. The MCU may send themedia data to the other participants (R1, R2, and R3). One or more, oreach, participant may have a separate connection (e.g., RTP/UDP) withthe MCU. In some embodiments, there might be a network between S andMCU, and/or between MCU and R1, and so on, and this is not shown in FIG.22 for simplicity.

A device may indicate the video stream bit rate by sending a feedbackmessage that indicates an adjusted packet loss rate. The device mayinclude an MCU. The device may measure a packet loss rate for the secondvideo stream. The device may determine the adjusted packet loss rate forthe second video stream. The adjusted packet loss rate may be associatedwith the determined video stream bit rate. The adjusted packet loss ratemay differ from the measured packet loss rate. The device may generate afeedback message that indicates the adjusted packet loss rate. Thedevice may send the feedback message to the second device.

The device (e.g., the MCU) may “trick” (e.g., create artificialconditions and/or values, among other manipulations, or the like) avideo sender, for example to decrease the sending rate, among otherreasons. For example, WebRTC may implement congestion control to adapt(e.g., modify) the sending rate of the video to the available bandwidthin the network (e.g., so that WebRTC may be TCP friendly). The sendingrate of the video may be reduced when a packet loss rate increases. Thepacket loss rate may be measured (e.g., determined) by a receiver. Thereceiver may include the MCU. The receiver may include a client device.The measurement of packet loss rate may be sent to the video sender. Thepacket loss rate may be sent periodically. Referring to FIG. 22, the MCUmay receive a media flow (e.g., a video stream) originating from asender S. The MCU may indicate a packet loss rate higher than themeasured packet loss rate (e.g., by leveraging the feedback mechanism).The adjusted (e.g., inflated) packet loss rate may be determined (e.g.determined by the MCU) based on a bit rate (e.g., the target bit rate)corresponding to one or more viewing parameters. The adjusted packetloss rate may be based on a determined video stream bit rate for thesender. The MCU may generate a feedback message that indicates theadjusted packet loss rate. The MCU may send the feedback message to adevice (e.g., the sender). The adjusted packet loss rate may be based ona “worst-case” viewing condition (e.g., nearest viewing distance of theother participants). The MCU may determine an adjusted packet loss ratehigher than the measured packet loss rate (e.g., in order to reduce thesending bit rate of sender S). The MCU may determine an adjusted packetloss rate lower than the measured packet loss rate (e.g., so that thesending bit rate of sender S increases).

WebRTC may include a congestion control mechanism. The sender and/or thereceiver may estimate the available bandwidth. The sender side estimateA_(s)(t_(k)) at time t_(k) may be as follows:

${A_{s}\left( t_{k} \right)} = \left\{ \begin{matrix}{{\max \left\{ {{X\left( t_{k} \right)},{{A_{s}\left( t_{k - 1} \right)}\left( {1 - {0.5\mspace{11mu} {p\left( t_{k} \right)}}} \right)}} \right\}},} & {{{if}\mspace{14mu} {p\left( t_{k} \right)}} > 0.10} \\{{1.05\left( {{A_{s}\left( t_{k - 1} \right)} + {1\mspace{14mu} {kbps}}} \right)},} & {{{if}\mspace{14mu} {P\left( t_{k} \right)}} < 0.02} \\{{A_{s}\left( t_{k - 1} \right)},} & {otherwise}\end{matrix} \right.$

where p(t_(k)) is the packet loss rate at time t_(k) and where X (t_(k))is the TCP friendly rate

${{X\left( t_{k} \right)} = \frac{8s}{{R\; T\; T\sqrt{2\mspace{11mu} {{{bp}\left( t_{k} \right)}/3}}} + {{{RTO}\left( {3\sqrt{\frac{3\mspace{11mu} {{bp}\left( t_{k} \right)}}{8}}} \right)}{p\left( {1 + {32{p^{2}\left( t_{k} \right)}}} \right)}}}},$

-   -   where s is the TCP segment size, RTT is the round trip time, RTO        is the TCP retransmission timeout (e.g., set to 4RTT), and b is        the maximum number of packets acknowledged by a single TCP        acknowledgement. The actual maximum sending rate A that can be        used may be limited by the available bandwidth estimate of the        receiver A_(r)(t_(k)):

A←min{A _(s)(t _(k)),A _(r)(t _(k))}; and

-   -   p(t_(k)) may be measured by the receiver and/or fed back to the        sender. From the above formula, the packet loss rate p(t_(k))        may be used by the MCU as a “knob” (e.g., control point) to        control the video sending rate (e.g., video stream bit rate).

A target bit rate may be used to determine p(t_(k)). The target bit ratemay be determined based on a video codec, size of the video to berendered, and/or other information. The target bit rate may correspondto a human's perception limit. Calculation of the viewing angle may bebased on the viewing distance and/or the size of the video. A bit ratecorresponding to the minimum viewing angle that is greater than or equalto the calculated viewing angle may be found in a pre-computed table,such as Table 1:

TABLE 1 Example Viewing Angles bit rate Ambient Viewing resolution widthheight (Kbps) contrast angle (°) “720p” 1280 720 3000. “720p_A28” 1280720 2700. 200:1 28.74 “720p_A16” 1280 720 2300. 200:1 16.36 “720p_A14”1280 720 2000. 200:1 14.33 “480p” 854 480 1400. “360p” 640 360 900.“240p” 428 240 400.

A viewing angle (e.g., in degrees) may be calculated as follows: (360/π)arctan(w/(2αd)), where α is the monitor resolution in (pixels per inch),w is the width of the video in pixels, and arctan is the arc tangentfunction. For example, when the resolution of the video is 720p and thecalculated angle is 15 degrees, then, based on Table 1, the desired bitrate may be 2300 Kbps.

The MCU may maintain a database, including tables, one or more, or each,of which may correspond to a video codec (e.g., H.264/AVC, HEVC). Aviewing parameter (e.g., the width and/or height of the video) may beobtained determined during the call set up (e.g., H.245, SIP/SDP). TheMCU may know the width and/or height of the video to be displayed to oneor more, or each, participant.

The MCU may take a control system approach (e.g., when the MCU does notknow the exact rate control algorithm implemented in the video sender).The MCU may adjust (e.g., incrementally increase and/or decrease) thereported packet loss rate. The MCU may adjust the reported packet lossrate until it observes a bit rate close to a target bit rate (e.g., adetermined video stream bit rate). For example, suppose that thereported packet loss rate is p1, at time t1. The MCU may measure the bitrate (e.g., the actual bit rate) of the video stream. The MCU may adjust(e.g., increase) the reported packet loss rate to p2=p1+δ, at time t2,for example perhaps if the actual bit rate is higher than the target bitrate, among other reasons. The MCU may further increase the reportedpacket loss rate to p3=p1+2δ, for example perhaps if the measured bitrate is still higher than the target bit rate, among other reasons. TheMCU may determine a desired packet loss rate p2, for example perhaps ifthe measured bit rate may now be lower than the target bit rate, amongother reasons. The MCU may delay the transmission of certain packetssuch as the ACKs (e.g., to trick the video sender on the value for RTT).An increase in RTT may result in a decrease in the estimated bandwidth.

Tricking the video sender, as described herein, by generating and/ormodifying feedback messages may be applied more generally to a scenarionot involving an MCU. For example, a first device may receive a videostream from a second device. The first device may generate and/or maymodify feedback messages sent from the first device to the second deviceusing any of the techniques described herein (e.g., increasing ordecreasing reported packet loss rates, delaying transmission of ACKpackets, etc.) in order to influence the second device to modify the bitrate used by the second device to encode the video stream sent from thesecond device to the first device. For example, the first and seconddevices may be client endpoints in a video session not involving an MCU.

A device may signal the video stream bit rate by signaling a bandwidthlimit. The device may include an MCU. The device may determine a firstviewing parameter for the first device and a third viewing parameter fora third device. The first viewing parameter may be associated with thefirst video stream which may be sent to the device from the firstdevice. The third viewing parameter may be associated with a third videostream which may be sent to the device from the third device. The devicemay determine a first video stream bit rate for the second video streamand/or a second video stream bit rate for the second device. The firstvideo stream bit rate may be based on the first viewing parameter. Thesecond video stream bit rate may be based on the third viewingparameter. The device may signal a bandwidth limit to the second device.The bandwidth limit may be associated with the first video stream bitrate and/or the second video stream bit rate. The bandwidth limit maycontrol the bit rate of video encoded by the second device. Thebandwidth limit may control the bit rate of video sent from the seconddevice to the device.

The MCU may throttle the throughput (e.g., if the MCU acts as a router).The MCU may set a limit (e.g., cap) on the bandwidth for the media flowto the MCU (e.g., to throttle the throughput). The MCU may determine thebandwidth limit (e.g., bandwidth cap) by a bit rate (e.g., a target bitrate) corresponding to the “worst-case” viewing condition (e.g., anearest viewing distance) of the participants. A video sender mayreceive the bandwidth limit and may infer an available bandwidth lowerthan the actual bandwidth. For example, feedback from a receiving clientmay be sent to the video sender. The video sender may infer an availablebandwidth based on the feedback from the receiving client. The RTCPprotocol may include feedback from a receiver (e.g., a receiving clientconnected via the MCU to the video sender) that may indicate thereceived throughput (e.g., indicate effective bandwidth to a sender).The video sender may adjust the transmission rate (e.g., bit rate) tofit within the capacity of a network, for example if the MCU sets abandwidth limit. The MCU may increase the bandwidth limit allocated tothe incoming media flow so that the video sender (e.g., S in the FIG.22) can increase its bit rate (e.g., when it may be useful for thetarget bit rate to be increased).

The MCU may intentionally drop packets from a video traffic flow whosebit rate may be higher than the bit rate corresponding to a “worst-case”viewing condition (e.g., a shortest of the viewing distances) of one ormore, or all, clients that receive (e.g., watch) the video traffic flow.The intentional packet dropping rate may be reduced (e.g., when thetarget bit rate increases).

The MCU may utilize signaling to instruct the sending client what videobit rate to send. The MCU may utilize signaling to inform the sendingclient of a maximum bit rate for sending the video content. Thesignaling may be proprietary signaling. The proprietary signaling mayspecify a target bit rate and/or a maximum bit rate for the videocontent. The MCU and/or the clients may utilize a standard signalingmechanism for signaling the video bit rate. For example, the MCU may usean H.245 Flow Control command to instruct the sending client of amaximum bit rate to use for the logical channel which carries video fromthe sending client to the MCU. The MCU may use the H.245 Flow Controlcommand if the call session between the MCU and a client terminal may bebased on the H.323 standard. The MCU may influence and/or control thebit rate used by the sending client without the need to drop packetsand/or alter the RTCP feedback reports.

One or more clients may provide a UAV viewing parameter to the MCU. TheUAV viewing parameter may include viewing distance, user presence, userattentiveness, ambient illuminance, and/or display properties such asdisplay size and/or display resolution. The UAV viewing parameter may besignaled from the client to the MCU. For example, the UAV viewingparameter may be signaled using a call setup protocol and/or a callcontrol protocol (e.g., H.245, SIP, etc.). The MCU may use the UAVviewing parameter to modify (e.g., adapt) the encoding of video contentsent to the client (e.g., if the UAV viewing parameter is explicitlysignaled by a client). The UAV viewing parameters which may bedetermined (e.g., derived) from face detection and/or other monitoringof the video may be sent from a client (e.g., explicitly provided by theclient).

UAV via traffic shaping may be implemented in a router or a similarnetwork entity. For example, UAV may be performed inside a router and/ora similar network entity which may not have transcoding capabilities.UAV may be performed in a router instead of inside a MCU. The networkentity may include an Access Point (AP) in a Wi-Fi network, an eNB, or aP-GW in an LTE network. The video traffic may flow in both directions.The video traffic may go through a common network entity. The commonnetwork entity may include a gateway type of device such as an AP, eNB,or a P-GW.

The UAV via traffic shaping in a network entity system architecture maybe similar to the architecture illustrated in FIG. 21. A network entitymay decode video content. The network entity may perform face detectionon the decoded video content. The network entity may use the output ofthe face detector for a first video stream (e.g., traffic flow) to shapea second video stream (e.g., traffic flow) in the other direction. Thevideo traffic shaping technique may include those techniques describedherein. UAV via traffic shaping may intentionally drop/delay/mark videopackets that pass through the network entity. The video traffic shapingmethod may include tricking the video sender into decreasing a sendingbit rate. For example, the network entity may trick the video senderinto decreasing the sending bit rate by adjusting (e.g., interceptingand/or modifying) the packet loss feedback report. The modified packetloss feedback report may show a higher packet loss rate than the rateactually observed by the receiver (e.g., the receiving endpoint). Thepacket loss feedback report may include an RTCP receiver report,

One or more clients may provide a UAV viewing parameter to the MCU. TheUAV viewing parameter may include viewing distance, user presence, userattentiveness, ambient illuminance, and/or display properties such asdisplay size and/or display resolution. The UAV viewing parameter may besignaled from the client to the MCU. For example, the UAV viewingparameter may be signaled using a call setup protocol or a call controlprotocol (e.g., H.245, SIP, etc.). The MCU may modify (e.g., adapt) theencoding of video content sent to that client based on the UAV viewingparameter. The MCU may modify the encoding of video content if suchparameters may be explicitly signaled by a client. A UAV viewingparameter which may be derived from face detection and/or othermonitoring of the video sent from a client may be provided (e.g.,explicitly provided) by the client.

UAV may be performed via the “cloud.” UAV via transcoding and/or UAV viatraffic shaping may achieve UAV without requiring any changes to theclient.

UAV may be implemented without degrading the perceived video quality.UAV may be implemented without making any changes to the client. Theclient may encode the content (e.g., video content) directly, perhapsbased on a UAV viewing parameter. The UAV viewing parameter may includethe viewer's viewing distance and/or circumstance, such as lightingconditions, among others. The client may send the content directly tothe peer clients. The client may send information from a client to aserver. A UAV program may run the server. The server may control theclient “on-the-fly” (e.g., real time). The server may send commands tothe client. The client may respond to the commands sent from the serverto achieve UAV. The server (e.g., the UAV program on the server) maysend a command and/or a request to the client. The command and/orrequest may include modify the video encoding/sending bit rate of theclient, change the video resolution at which the client sends video,prefilter and/or remove some level of detail from the video beforeencoding it, and/or otherwise adapt the video content sent.

The client to server communications may be supported bywebsocket+javascript. A client that may respond to server commands maybe supported by the Chrome/Firefox/Opera+WebRTC. The client may supportother browsers by installing WebRTC plugins, for example. WebRTC (WebReal-Time Communication) is an API definition being drafted by the WorldWide Web Consortium (W3C). Web RTC enables browser-to-browserapplications for voice calling, video chat, and P2P file sharing withoutplugins. FIG. 23 shows an example architecture of a WebRTC system.

One or more mechanisms or techniques may be implemented to sendinformation from the client to the server and/or to control the clientfrom server “on-the-fly.” Referring to FIG. 23. UAV may be implementedin an APP server. The APP server may determine (e.g., estimate) viewingconditions. The viewing conditions may include face-detection, ambientlighting estimation, etc. The APP server may run along with an HTTPserver, a SIP server, an application server, or another device locatedsomewhere in the cloud.

One or more browser clients may share the same APP server. One or morebrowser clients may each have their own APP server (e.g., to achieveUAV). An APP server may communicate with a second APP server (e.g., inorder to enable UAV for a video conferencing session). For example, theAPP servers may communicate via a signaling path. Client-to-clientand/or client to APP server signaling may facilitate communicationbetween APP servers. For example, a first client may identify a firstAPP server to a second client during the setup of a video communicationsession. The second client may identify the first APP server of thefirst client to a second (e.g., its own) APP server. The first APPserver and the second APP server may discover each other and may begincommunicating. The first client may introduce the first APP server tothe second client and/or the second APP server. The second client mayintroduce the second APP server to the first client and/or the first APPserver.

The communications between a server and a client may implementWebSocket. WebSocket provides for full-duplex communication. Full-duplexcommunications between the client and server may be provided byXMLHttpRequest (XHR)+Google App Engine Channel API. Google App Enginemay enable building and/or running applications on Google'sinfrastructure.

Sensor information from the client may be collected and/or communicatedto the server via javascript (e.g., since the clients may be webbrowsers). Javascript collection and communication of sensor informationmay be supported in Windows and/or Linux. The collection may includescreen captures, parsed multimedia from the compressed media, and/orsamples of frame captures from the camera output. The collection and/ortransmission via WebSocket to the server may enable the server toperform the computer vision related functions and/or offload thecomputer vision related functions from the client.

A client may utilize WebRTC to respond to a command from a server. Theencoding may be performed in real-time and/or there are APIs that mayenable adjusting the frame resolution. WebRTC may adjust (e.g., adapt)the video resolution during the capturing and/or encoding. A firstadjustment (e.g., first adaptation) may be based on the cameraresolution (e.g., VideoAdapter::AdaptFrame method). A second adjustment(e.g., second adaptation) may be based on channel conditions and/orbuffer fullness (e.g., via the resize_key_frame function).

A first resolution adaptation (e.g., camera resolution) may be utilizedto achieve a dynamic resolution change. The first resolution adaptationmay be based on one or more commands from the server to the client. AgetUserMedia API for WebRTC defined by W3C may enable the dynamicresolution change. A second resolution adaptation in WebRTC may beutilized. The second resolution adaptation may require changes to theWebRTC stack inside the encoder loop. Bit rate adaptation may beutilized. Bit rate adaptation may be utilized when an encoding bit ratemay be set and/or influenced by a WebRTC client.

A WebRTC app may use multi-party connections (e.g., multipleRTCPeerConnections) so to that one or more, or every, endpoint mayconnect to one or more, or every, other endpoint in a meshconfiguration. An example multi-party connection mesh configuration isillustrated in FIG. 24. Applications (e.g., talky.io) may work well fora small handful of peers. The UAV APP may run somewhere in the cloud.The UAV APP may send multiple commands to one or more, or each endpoint(e.g., one for each of their connections). A WebRTC application mayselect one endpoint to distribute streams to one or more, or all,others. The WebRTC may distribute streams in a star configuration. AWebRTC endpoint may run on a server. The WebRTC endpoint may cooperatewith a unique and/or proprietary redistribution mechanism.

For multi-party connections, a UAV APP may be run on the server when avideo mixer is not used. When a video mixer is used, a worst caseselection amongst viewing conditions may be made to reduce the bitrate.The worst case selection may not impact the quality of a critical viewer(e.g., a most critical viewer—perhaps the viewer with the shortestviewing distance). A MCU may be used for multi-party connections with aUAV APP run on a server. The UAV APP server may be run in the same nodeas the MCU. UAV may be implemented without transcoding.

The processes and instrumentalities described herein may apply in anycombination, may apply to other wireless technology, and for otherservices (e.g., not limited for proximity services).

A WTRU may refer to an identity of the physical device, or to the user'sidentity such as subscription related identities, e.g., MSISDN, SIP URI,etc. WTRU may refer to application-based identities, e.g., user namesthat may be used per application.

The processes described above may be implemented in a computer program,software, and/or firmware incorporated in a computer-readable medium forexecution by a computer and/or processor. Examples of computer-readablemedia include, but are not limited to, electronic signals (transmittedover wired and/or wireless connections) and/or computer-readable storagemedia. Examples of computer-readable storage media include, but are notlimited to, a read only memory (ROM), a random access memory (RAM), aregister, cache memory, semiconductor memory devices, magnetic mediasuch as, but not limited to, internal hard disks and removable disks,magneto-optical media, and/or optical media such as CD-ROM disks, and/ordigital versatile disks (DVDs). A processor in association with softwaremay be used to implement a radio frequency transceiver for use in aWTRU, UE, terminal, base station. RNC, and/or any host computer.

1-44. (canceled)
 45. A method of controlling a video communication, themethod comprising: receiving a first video stream from a first deviceand a second video stream from a second device; determining a viewingparameter associated with the first device by analyzing the first videostream; modifying, based on the viewing parameter, the second videostream; and sending the modified second video stream to the firstdevice.
 46. The method of claim 45, wherein modifying the second videostream comprises at least one of: re-encoding the second video stream,adjusting an orientation, removing a video detail, or adjusting a bitrate.
 47. The method of claim 45, wherein the viewing parameter is afirst viewing parameter, and the method further comprising: determininga second viewing parameter associated with the second device byanalyzing the second video stream; modifying, based on the secondviewing parameter, the first video stream, wherein modifying the firstvideo stream comprises at least one of: re-encoding the first videostream, adjusting an orientation, removing a video detail, or adjustinga bit rate; and sending the modified first video stream to the seconddevice.
 48. The method of claim 45, wherein the viewing parameter is afirst viewing parameter, the method further comprising: receiving athird video stream from a third device; receiving a fourth video streamfrom the second device; determining a third viewing parameter associatedwith the third device by analyzing the third video stream; modifying thefourth video stream based on the third viewing parameter, whereinmodifying the fourth video stream further comprises comparing bit ratesassociated with the first viewing parameter and the third viewingparameter, and wherein modifying the fourth video stream comprises, on acondition that the third viewing parameter is associated with a higherbit rate than the first viewing parameter, adjusting a bit rateassociated with the fourth video stream based on the third viewingparameter; and sending the modified fourth video stream to the thirddevice.
 49. The method of claim 45, wherein the viewing parametercomprises a user viewing parameter that comprises at least one of: auser's presence, a user's location with respect to a screen of the firstdevice, a user's orientation with respect to a screen of the firstdevice, a user's viewing angle with respect to a screen of the firstdevice, a user's distance from a screen of the first device, a user'svisual acuity, an ambient lighting condition, a number of users viewinga screen of the first device, or a user's point of attention.
 50. Themethod of claim 45, wherein the viewing parameter comprises a deviceviewing parameter that comprises at least one of: size of a screen ofthe first device, contrast of a screen of the first device, brightnessof a screen of the first device, pixel density of a screen of the firstdevice, size of a window displaying multimedia content on the firstdevice, setup of a camera on the first device, or a location of a windowdisplaying the multimedia content on the first device.
 51. The method ofclaim 45, wherein the viewing parameter comprises a content viewingparameter that comprises at least one of: contrast, color gamut, orrange of depth of three-dimensional content.
 52. A device forcontrolling a video communication, the device configured at least inpart to: receive a first video stream from a first device and a secondvideo stream from a second device; determine a viewing parameterassociated with the first device based on an analysis of the first videostream; modify, based on the viewing parameter, the second video stream;and send the modified second video stream to the first device.
 53. Thedevice of claim 52, wherein being configured to modify the second videostream comprises being configured to at least one of: re-encode thesecond video stream, adjust an orientation, remove a video detail, oradjust a bit rate.
 54. The device of claim 52, wherein the viewingparameter is a first viewing parameter, the device further configuredto: determine a second viewing parameter associated with the seconddevice based on an analysis of the second video stream; modify, based onthe second viewing parameter, the first video stream, wherein beingconfigured to modify the first video stream comprises being configuredto at least one of: re-encode the first video stream, adjust anorientation, remove a video detail, or adjust a bit rate; and send themodified first video stream to the second device.
 55. The device ofclaim 52, wherein the viewing parameter is a first viewing parameter,the device further configured to: receive a third video stream from athird device; receive a fourth video stream from the second device;determine a third viewing parameter associated with the third devicebased on an analysis of the third video stream; modify the fourth videostream based on the third viewing parameter, wherein being configured tomodify the fourth video stream further comprises being configured tocompare bit rates associated with the first viewing parameter and thethird viewing parameter, and wherein being configured to modify thefourth video stream comprises, on a condition that the third viewingparameter is associated with a higher bit rate than the first viewingparameter, being configured to adjust a bit rate associated with thefourth video stream based on the third viewing parameter; and send themodified fourth video stream to the third device.
 56. The device ofclaim 52, wherein the viewing parameter comprises a user viewingparameter that comprises at least one of: a user's presence, a user'slocation with respect to a screen of the first device, a user'sorientation with respect to a screen of the first device, a user'sviewing angle with respect to a screen of the first device, a user'sdistance from a screen of the first device, a user's visual acuity, anambient lighting condition, a number of users viewing a screen of thefirst device, or a user's point of attention.
 57. The device of claim52, wherein the viewing parameter comprises a device viewing parameterthat comprises at least one of: size of a screen of the first device,contrast of a screen of the first device, brightness of a screen of thefirst device, pixel density of a screen of the first device, size of awindow displaying multimedia content on the first device, setup of acamera on the first device, or a location of a window displaying themultimedia content on the first device.
 58. The device of claim 52,wherein the viewing parameter comprises a content viewing parameter thatcomprises at least one of: contrast, color gamut, or range of depth ofthree-dimensional content.
 59. The device of claim 52, wherein thedevice comprises a multipoint control unit (MCU).
 60. A method ofcontrolling a video communication, the method comprising: receiving afirst video stream from a first device; determining a viewing parameterassociated with the first device by analyzing the first video stream;determining, based on the viewing parameter, a video stream bit rate fora second video stream from a second device; and indicating the videostream bit rate to the second device.
 61. The method of claim 60,wherein indicating the video stream bit rate comprises: receiving thesecond video stream from the second device; measuring a packet loss ratefor the second video stream; determining an adjusted packet loss rate,associated with the determined video stream bit rate, that differs fromthe measured packet loss rate; generating a feedback message thatindicates the adjusted packet loss rate; and sending the feedbackmessage to the second device.
 62. The method of claim 60, wherein theviewing parameter is a first viewing parameter, and wherein the videostream bit rate is a first video stream bit rate for the second videostream, and wherein indicating the video stream bit rate comprises:receiving a third video stream from a third device; determining a thirdviewing parameter associated with the third device by analyzing thethird video stream; determining, based on the third viewing parameter, asecond video stream bit rate for the second video stream; and signalinga bandwidth limit, associated with the first video stream bit rate andthe second video stream bit rate, to the second device.
 63. The methodof claim 60, further comprising receiving the second video stream fromthe second device, wherein indicating the video stream bit ratecomprises removing one or more packets from the second video streambefore sending the second video stream to the first device.
 64. Themethod of claim 60, wherein the viewing parameter comprises at least oneof a user viewing parameter, a device viewing parameter, or a contentviewing parameter.